Regular Expressions

Regular Expressions

Explain regular expressions in Bash

Regular expressions, also called regex or regexp, are patterns used to match text. They allow complex pattern matching using special characters.

Regular expressions are used in:

  • Search and replace tools like grep, sed and awk.

  • Programming languages like Perl, Python, Java etc.

  • Text editors

  • Utilities like vim, less etc.

In Bash, regular expressions are mainly used with grep and sed commands.

Here is a table of common regular expression notations:

NotationDescription
.Matches any single character
*Matches 0 or more of the preceding expression
^Matches the beginning of the line
$Matches the end of the line
[]Matches any single character in the brackets
[^]Matches any character NOT in the brackets
\dMatches any digit
\DMatches any non-digit
\sMatches any whitespace character
\SMatches any non-whitespace character
\wMatches alphanumeric characters and _
\WMatches any non-alphanumeric character
+Matches 1 or more of the preceding expression
?Matches 0 or 1 of the preceding expression
{n}Matches exactly n number of times
{n,}Matches n or more times
{n,m}Matches at least n but not more than m times
OR - Either expression 1 or expression 2 matches
()Captures and groups
\numberReferences previously matched group

Examples:

grep -E '\d\d' file # Match any 2 digit number

grep -E '[a-z]{3}' file # Match 3 lowercase letters

sed 's/hello/hi/g' file # Replace all 'hello' with 'hi'

Regular expressions allow you to define complex patterns to match text. They use special characters as "metacharacters" to define the pattern.


Delimiters

Yes, regular expressions in Bash need to be enclosed in delimiters.

The most common delimiters used for regular expressions in Bash are:

  • '/' - Slash delimiters

  • '#' - Hash delimiters

  • '~' - Tilde delimiters

Operator =~

The =~ operator in Bash is used to match a regular expression against a string. It's part of Bash Conditional Expressions.

The syntax is:

string =~ regex

Where:

  • string is the string you want to match against the regex

  • regex is the regular expression pattern enclosed in delimiters

It returns 0 if the string does not match the regex and 1 if it matches.

Some examples:

if [[ "test12" =~ /^\d\d$/ ]]; then
  echo "Matches!"
fi
# Prints "Matches!" because "test12" matches the regex /^\d\d$/

if [[ "testabc" =~ /^\d\d$/ ]]; then
  echo "Matches!"
else
  echo "Does not match"
fi 
# Prints "Does not match" because "testabc" does not match the regex

You can also use capturing groups in the regex to extract parts of the string:

if [[ "test12" =~ (/^\d)(\d$/) ]]; then
  first=${BASH_REMATCH[1]}  # first capturing group
  second=${BASH_REMATCH[2]} # second capturing group
fi

echo $first  # Prints 1
echo $second # Prints 2

So in short, the =~ operator in Bash allows you to match a string against a regular expression and extract parts of the string using capturing groups.

The next script example demonstrates how to use regular expressions in Bash scripts.

#!/bin/bash

# Check if a string starts with "Hello"
if [[ "Hello World" =~ ^Hello ]]; then
    echo "String starts with Hello" 
fi

# Check if a string contains a number  
if [[ "test123" =~ \d ]]; then    
    echo "String contains a number"  
fi

# Match any 2 digit number
echo "12" | grep -E '\d\d'

# Match any 3 letter word  
echo "the cat sat" | grep -E '\w{3}'

# Match a range of characters between a-z  
echo "abc123def" | grep -E '[a-z]{3}'

Note:

In the code example above we use this code fragment that we will explain below in detail:

if [[ "Hello World" =~ ^Hello ]]; then
    echo "String starts with Hello" 
fi

The code is checking if the string "Hello World" matches the regex ^Hello which checks if the string starts with "Hello". Since it does, the echo statement is printed.

There are no delimiters used for the regular expression ^Hello because the =~ operator in Bash does not require delimiters. Only the = and != operators require delimiters around the regex.

The =~ operator directly interprets the pattern after it as a regex, so delimiters are not needed. This is different from other languages like Python or JavaScript where delimiters are always required for regex.


Conclusion

Regular expressions are an extremely useful and powerful tool for pattern matching in text. They allow you to match complex text patterns with a concise syntax. Some of the main advantages of regular expressions are:

Conciseness: Regular expressions allow you to represent complex patterns in a very compact form. This makes them ideal for searching large amounts of text.

Flexibility: Regular expressions can match a wide variety of patterns and text structures. They are very flexible and adaptable.

Speed: Regular expression engines are optimized to perform pattern matching very quickly, making them faster than alternatives for many tasks.

Readability: Once you are familiar with the syntax, regular expressions can be readable and self-documenting.

In summary, regular expressions:

  • Allow complex pattern matching in a concise and readable way

  • Are faster and more optimized than alternatives for text search

  • Are very flexible and can match a wide range of text patterns

  • Provide features like capturing groups, character classes, wildcards, etc.

Because of these advantages, regular expressions have become an essential tool for tasks like text processing, data validation, search-and-replace and many other uses. As a developer, understanding and leveraging regular expressions can greatly improve your productivity.

So in conclusion, regular expressions are an important and powerful tool to have in your toolkit due to their concise syntax, flexibility, speed and ability to match complex text patterns.


Disclaim: I have ask AI (Rix), several questions to create this article. If you have questions, post a comment. I could use some feedback. If you like this content encourage me to write more.