Regular expressions, also called regex or regexp, are patterns used to match text. They allow complex pattern matching using special characters.
Regular expressions are used in:
Search and replace tools like grep, sed and awk.
Programming languages like Perl, Python, Java etc.
Text editors
Utilities like vim, less etc.
In Bash, regular expressions are mainly used with grep and sed commands.
Here is a table of common regular expression notations:
Notation | Description |
. | Matches any single character |
* | Matches 0 or more of the preceding expression |
^ | Matches the beginning of the line |
$ | Matches the end of the line |
[] | Matches any single character in the brackets |
[^] | Matches any character NOT in the brackets |
\d | Matches any digit |
\D | Matches any non-digit |
\s | Matches any whitespace character |
\S | Matches any non-whitespace character |
\w | Matches alphanumeric characters and _ |
\W | Matches any non-alphanumeric character |
+ | Matches 1 or more of the preceding expression |
? | Matches 0 or 1 of the preceding expression |
{n} | Matches exactly n number of times |
{n,} | Matches n or more times |
{n,m} | Matches at least n but not more than m times |
OR - Either expression 1 or expression 2 matches | |
() | Captures and groups |
\number | References previously matched group |
Examples:
grep -E '\d\d' file # Match any 2 digit number
grep -E '[a-z]{3}' file # Match 3 lowercase letters
sed 's/hello/hi/g' file # Replace all 'hello' with 'hi'
Regular expressions allow you to define complex patterns to match text. They use special characters as "metacharacters" to define the pattern.
Delimiters
Yes, regular expressions in Bash need to be enclosed in delimiters.
The most common delimiters used for regular expressions in Bash are:
'/' - Slash delimiters
'#' - Hash delimiters
'~' - Tilde delimiters
Operator =~
The =~ operator in Bash is used to match a regular expression against a string. It's part of Bash Conditional Expressions.
The syntax is:
string =~ regex
Where:
string is the string you want to match against the regex
regex is the regular expression pattern enclosed in delimiters
It returns 0 if the string does not match the regex and 1 if it matches.
Some examples:
if [[ "test12" =~ /^\d\d$/ ]]; then
echo "Matches!"
fi
# Prints "Matches!" because "test12" matches the regex /^\d\d$/
if [[ "testabc" =~ /^\d\d$/ ]]; then
echo "Matches!"
else
echo "Does not match"
fi
# Prints "Does not match" because "testabc" does not match the regex
You can also use capturing groups in the regex to extract parts of the string:
if [[ "test12" =~ (/^\d)(\d$/) ]]; then
first=${BASH_REMATCH[1]} # first capturing group
second=${BASH_REMATCH[2]} # second capturing group
fi
echo $first # Prints 1
echo $second # Prints 2
So in short, the =~ operator in Bash allows you to match a string against a regular expression and extract parts of the string using capturing groups.
The next script example demonstrates how to use regular expressions in Bash scripts.
#!/bin/bash
# Check if a string starts with "Hello"
if [[ "Hello World" =~ ^Hello ]]; then
echo "String starts with Hello"
fi
# Check if a string contains a number
if [[ "test123" =~ \d ]]; then
echo "String contains a number"
fi
# Match any 2 digit number
echo "12" | grep -E '\d\d'
# Match any 3 letter word
echo "the cat sat" | grep -E '\w{3}'
# Match a range of characters between a-z
echo "abc123def" | grep -E '[a-z]{3}'
Note:
In the code example above we use this code fragment that we will explain below in detail:
if [[ "Hello World" =~ ^Hello ]]; then
echo "String starts with Hello"
fi
The code is checking if the string "Hello World" matches the regex ^Hello
which checks if the string starts with "Hello". Since it does, the echo statement is printed.
There are no delimiters used for the regular expression ^Hello
because the =~
operator in Bash does not require delimiters. Only the =
and !=
operators require delimiters around the regex.
The =~
operator directly interprets the pattern after it as a regex, so delimiters are not needed. This is different from other languages like Python or JavaScript where delimiters are always required for regex.
Conclusion
Regular expressions are an extremely useful and powerful tool for pattern matching in text. They allow you to match complex text patterns with a concise syntax. Some of the main advantages of regular expressions are:
Conciseness: Regular expressions allow you to represent complex patterns in a very compact form. This makes them ideal for searching large amounts of text.
Flexibility: Regular expressions can match a wide variety of patterns and text structures. They are very flexible and adaptable.
Speed: Regular expression engines are optimized to perform pattern matching very quickly, making them faster than alternatives for many tasks.
Readability: Once you are familiar with the syntax, regular expressions can be readable and self-documenting.
In summary, regular expressions:
Allow complex pattern matching in a concise and readable way
Are faster and more optimized than alternatives for text search
Are very flexible and can match a wide range of text patterns
Provide features like capturing groups, character classes, wildcards, etc.
Because of these advantages, regular expressions have become an essential tool for tasks like text processing, data validation, search-and-replace and many other uses. As a developer, understanding and leveraging regular expressions can greatly improve your productivity.
So in conclusion, regular expressions are an important and powerful tool to have in your toolkit due to their concise syntax, flexibility, speed and ability to match complex text patterns.
Disclaim: I have ask AI (Rix), several questions to create this article. If you have questions, post a comment. I could use some feedback. If you like this content encourage me to write more.