On the dot
In regular expression, the dot .
matches almost any character. The only characters it doesn't normally match are the newline characters. For the dot to match all characters, you must enable what is called the single line mode (aka "dot all").
In C#, this is specified using RegexOptions.Singleline
. You can also embed this as (?s)
in the pattern.
References
On metacharacters and escaping
The .
isn't the only regex metacharacters. They are:
( ) { } [ ] ? * + - ^ $ . | \
Depending on where they appear, if you want these characters to mean literally (e.g. .
as a period), you may need to do what is called "escaping". This is done by preceding the character with a \
.
Of course, a \
is also an escape character for C# string literals. To get a literal \
, you need to double it in your string literal (i.e. "\\"
is a string of length one). Alternatively, C# also has what is called @
-quoted string literals, where escape sequences are not processed. Thus, the following two strings are equal:
"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"
Since \
is used a lot in regular expression, @
-quoting is often used to avoid excessive doubling.
References
On character classes
Regular expression engines allow you to define character classes, e.g. [aeiou]
is a character class containing the 5 vowel letters. You can also use -
metacharacter to define a range, e.g. [0-9]
is a character classes containing all 10 digit characters.
Since digit characters are so frequently used, regex also provides a shorthand notation for it, which is \d
. In C#, this will also match decimal digits from other Unicode character sets, unless you're using RegexOptions.ECMAScript
where it's strictly just [0-9]
.
References
Related questions
Putting it all together
It looks like the following will work for you:
@-quoting digits_ _____anything but ', captured
| / \ / \
new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase);
\/ \/
escape ( escape )
Note that I've modified the pattern slightly so that it uses negated character class instead of reluctance wildcard matching. This causes a slight difference in behavior if you allow '
to be escaped in your input string, but neither pattern handle this case perfectly. If you're not allowing '
to be escaped, however, this pattern is definitely better.
References