Regular Expressions Syntax

Suppose, we are looking for a numeric digit then the regular expression we would search for is [0-9] . The brackets indicate that the character being compared should match any one of the characters enclosed within the bracket. The dash (-) between 0 and 9 indicates that it is a range from 0 to 9. Therefore, this regular expression will match any character between 0 and 9, that is, any digit. If we want to search for a special character literally we must use a backslash before the special character. For example, the single character regular expression \* matches a single asterisk. In the table below the special characters are briefly described.

Table 4-1. Regexp Control Characters
^ Beginning of the string. The expression ^A will match an A only at the beginning of the string.
^ The caret (^) immediately following the left-bracket ([) has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression [^0-9] indicates that the target character should not be a digit.
$ The dollar sign ($ ) will match the end of the string. The expression abc$ will match the sub-string abc only if it is at the end of the string.
| The alternation character (| ) allows either expression on its side to match the target string. The expression a|b will match a as well as b .
. The dot (. ) will match any character.
* The asterisk (* ) indicates that the character to the left of the asterisk in the expression should match 0 or more times.
+ The plus (+ ) is similar to asterisk but there should be at least one match of the character to the left of the + sign in the expression.
? The question mark (? ) matches the character to its left 0 or 1 times.
() The parenthesis affects the order of pattern evaluation.
[ ] Brackets ([ and ] ) enclosing a set of characters indicates that any of the enclosed characters may match the target character.

The parenthesis, besides affecting the evaluation order of the regular expression, also serves as tagged expression which is something like a temporary memory. This memory can then be used when we want to replace the source expression with a replace expression. The replace expression can specify an & character which means that the & represents the sub-string that was found. So, if the sub-string that matched the regular expression is abcd , then a replace expression of xyz&xyz will change it to xyzabcdxyz . The replace expression can also be expressed as xyz\0xyz . The \0 indicates a tagged expression representing the entire sub-string that was matched. Similarly you can have other tagged expression represented by \1 , \2 etc. Note that although the tagged expression 0 is always defined, the tagged expression 1, 2, etc. are only defined if the regular expression used in the search had enough sets of parenthesis. Here are few examples:

Table 4-2. Regexp Examples
String Search Replace Result
Mr. (Mr)(\.) \1s\2 Mrs.
abc (a)b(c) &-\1-\2 abc-a-c
bcd (a|b)c*d &-\1 bcd-b
abcde (.*)c(.*) &-\1-\2 abcde-ab-de
cde (ab|cd)e &-\1 cde-cd
  ([0-9,A-Z,a-z,\ ]*)(STOP:)([0-9,A-Z,a-z,\ ]*) -> \1\2 foo bar STOP: lkasdfkjakjlf foo bar STOP: