37

The quick brown fox jumps over the lazy dog" is an English-language pangram, alphabet! that is, a phrase that contains all of the letters of the alphabet. It has been used to test typewriters alphabet. and computer keyboards, and in other applications involving all of the letters in the English alphabet.

I need to get the "alphabet." word in regex. In the above text there are 3 instances. It should not include "alphabet!". I just tried regex with

 MatchCollection match = Regex.Matches(entireText, "alphabet."); 

but this returns 4 instances including "alphabet!". How to omit this and get only "alphabet."

Alastair Pitts
  • 19,423
  • 9
  • 68
  • 97
pili
  • 795
  • 2
  • 10
  • 24

3 Answers3

53

. is a special character in regex, that matches anything. Try escaping it:

 MatchCollection match = Regex.Matches(entireText, @"alphabet\.");
Håvard
  • 9,900
  • 1
  • 41
  • 46
  • 2
    Same answers in gap of seconds :) – manojlds Apr 17 '11 at 22:40
  • Hi Harpyon, no results returned for this expression. If I just put "alphabet" there are 4 instances. Is there any specific syntax to c#? – pili Apr 17 '11 at 22:45
  • Are you sure it doesn't work? I'm unable to test the C# portion of it, but the regex seems to be working when I test it on [RegexHero](http://regexhero.net/tester/). – Håvard Apr 17 '11 at 22:48
  • Hi Harypyon, C# wanted to have this option and it worked.. Thanks... RegexOptions myRegexOptions = RegexOptions.None; Regex myRegex = new Regex(strRegex, myRegexOptions); – pili Apr 17 '11 at 22:53
  • Hi Harypyon, If I want to get that alphabet if only it proceeds with " " or "\n" how should ammend that please.. – pili Apr 17 '11 at 23:09
  • If you want to get `alphabet` only followed by a space or newline, you can use a lookahead: `alphabet(?= |\n)`. – Håvard Apr 17 '11 at 23:16
  • 1
    It wasn't obvious, I'd underlined it: to make an expression work one needs to add the «@» sign before the string. – Hi-Angel Dec 24 '14 at 12:06
23

. is a special character in regular expressions. You need to escape it with a slash first:

Regex.Matches(entireText, "alphabet\\.")

The slash ends up being double because \ inside a string must in turn be escaped with another slash.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • 6
    Usually Regular expression strings are better off being verbatim – manojlds Apr 17 '11 at 22:40
  • @manojlds: I hope you agree that this is a matter of preference. – Jon Apr 17 '11 at 22:41
  • 2
    Yes, but already complex regular expressions would have \\ strewn all around. – manojlds Apr 17 '11 at 22:41
  • Thanks guys but there is no results returned for the expression. Is there a c# specific expression like "^ $" ? – pili Apr 17 '11 at 22:49
  • @user712307 - ok your comments makes it a little more clear. How are you initializing entireText? – manojlds Apr 17 '11 at 22:54
  • @manojlds : RegexOptions myRegexOptions = RegexOptions.None; Regex myRegex = new Regex(strRegex, myRegexOptions); Did that and worked.. Thanks for your comments. – pili Apr 17 '11 at 23:03
11

"." has special meaning in Regular expressions. Escape it to match the period

MatchCollection match = Regex.Matches(entireText, @"alphabet\.");

Edit:

Full code, giving expected result:

        string entireText = @"The quick brown fox jumps over the lazy dog is an English-language pangram, alphabet! that is, a phrase that contains all of the letters of the alphabet. It has been used to test typewriters alphabet. and computer keyboards, and in other applications involving all of the letters in the English alphabet.";
        MatchCollection matches = Regex.Matches(entireText, @"alphabet\.");
        foreach (Match match in matches)
        {
            foreach (Group group in match.Groups)
            {
                Console.WriteLine(group);
            }
        }
manojlds
  • 290,304
  • 63
  • 469
  • 417
  • Hi manojlds, no results returned for this expression. If I just put "alphabet" there are 4 instances. Is there any specific syntax to c#? – pili Apr 17 '11 at 22:46
  • Just verified in C#. Gives three alphabet..Please verify your code. See my edit – manojlds Apr 17 '11 at 22:52
  • HI Manjojlds.. thanks for your code. it works.. I added this portion: RegexOptions myRegexOptions = RegexOptions.None; Regex myRegex = new Regex(strRegex, myRegexOptions); – pili Apr 17 '11 at 23:07
  • If I want to get that alphabet if only it proceeds with " " or "\n" how should ammend that please.. – pili Apr 17 '11 at 23:08
  • Use something like \salphabet\. for the regex. \s matches any whitespace character (spaces, tabs, line breaks). – manojlds Apr 17 '11 at 23:14