3

Can someone give a better explanation for these special characters examples in here? Or provide some clearer examples?

(x)

The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words.

decimal point

For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.

Word boundary \b

/\w\b\w/ will never match anything, because a word character can never be followed by both a non-word and a word character.

non word boundary \B

/\B../ matches 'oo' in "noonday" (, and /y\B./ matches 'ye' in "possibly yesterday." totally having no idea what the above example is showing :(

Much thanks!

yvonnezoe
  • 7,129
  • 8
  • 30
  • 47

1 Answers1

5

Parentheses (aka capture groups)

Parantheses are used to indicate a group of symbols in the regular expression that, when matched, are 'remembered' in the match result. Each matched group is labelled with a numbered order, as \1, \2, and so on. In the example /(foo) (bar) \1 \2/ we remember the match foo as \1, and the match bar as \2. This means that the string "foo bar foo bar" matches the regular expression because the third and fourth terms (the \1 and \2) are matching the first and second capture groups (i.e. (foo) and (bar)). You can use capture groups in javascript like this:

/id:(\d+)/.exec("the item has id:57") // => ["id:57", "57"]

Note that in the return we get the whole match, and the subsequent groups that were captured.

Decimal point (aka wildcard)

A decimal point is used to represent a single character that can have any value. This means that the regular expression /.n/ will match any two character string where the second character is an 'n'. So /.n/.test("on") // => true, /.n/.test("an") // => true but /.n/.test("or") // => false. DrC brings up a good point in the comments that this won't match a newline character, but I feel in order for that to be an issue you need to explicitly specify multiline mode.

Word boundaries

A word boundary will match against any non-word character that directly precedes, or directly follows a word (i.e. adjacent to a word character). In javascript the word characters are any alpahnumeric and the underscore (mdn), non word is obviously everything else! The trick for word boundaries is that they are zero width assertions, which means they don't count as a character. That's why /\w\b\w/ will never match, because you can never have a word boundary between two word characters.

Non-word boundaries

The opposite of a word boundary, instead of matching a point that goes from non-word to word, or word to non-word (i.e. the ends of a word) it will match points where it's moving between the same types of character. So for our examples /\B../ will match the first point in the string that is between two characters of the same type and the next two characters, in this case it's between the first 'n' and 'o', and the next two characters are "oo". In the second example /y\B./ we are looking for the character 'y' followed by a character of matching type (so a word character), and the '.' will match that second character. So "possibly yesterday" won't match on the 'y' at the end of "possibly" because the next character is a space, which is a non word, but it will match the 'y' at the beginning of "yesterday", because it's followed by a word character, which is then included in the match by the '.' in the regular expression.

Overall, regular expressions are popular in many languages and based off a sound theoretical basis, so there's a lot of material on these characters. In general, Javascript is very similar to Perl's PCRE regular expressions (but not exactly the same!), so the majority of your questions about javascript regular expressions would be answered by any PCRE regex tutorial (of which there are many).

Hope that helps!

Community
  • 1
  • 1
kieran
  • 1,537
  • 10
  • 10
  • Nice answer. Very minor nitpick - . matches any character except newline. – DrC May 08 '13 at 06:05
  • @DrC yeah true but I guess typically that would only occur in explicit multiline mode? otherwise the newline just inherently wouldn't be in the string being matched because it would terminate at `$`? I guess I'll add in a clarification though – kieran May 08 '13 at 06:09
  • Thank you @kieran for this clear explanation! :D it enlightens me. :) I've never touch Perl as well. But this has help me a lot! :) – yvonnezoe May 08 '13 at 06:12