2

I'm looping through thousands of strings with various regexes to check for simple errors. I would like to add a regex to check for the correct use of commas.

If a comma exists in one of my strings, then it MUST be followed by either whitespace or exactly three digits:

  • valid: ,\s
  • valid: ,\d\d\d

But if a comma is followed by any other pattern, then it is an error:

  • invalid: ,\D
  • invalid: ,\d
  • invalid: ,\d\d
  • invalid: ,\d\d\d\d

The best regex I've come up with thus far is:

Regex CommaError = new Regex(@",(^(\d\d\d)|\S)"); // fails case #2

To test, I am using:

if (CommaError.IsMatch(", ")) // should NOT match
    Console.WriteLine("failed case #1");
if (CommaError.IsMatch(",234")) // should NOT match
    Console.WriteLine("failed case #2");
if (!CommaError.IsMatch("0,a")) // should match
    Console.WriteLine("failed case #3");
if (!CommaError.IsMatch("0,0")) // should match
    Console.WriteLine("failed case #4");
if (!CommaError.IsMatch("0,0a1")) // should match
    Console.WriteLine("failed case #5");

But the regex I gave above fails case #2 (it matches when it should not).

I've invested several hours investigating this, and searched the Web for similar regexes, but have hit a brick wall. What's wrong with my regex?

Update: Peter posted a comment with a regex that works the way I want:

Regex CommaError = new Regex(@",(?!\d\d\d|\s)");

Edit: Well, almost. It fails in this case:

if (!CommaError.IsMatch("1,2345")) // should match
    Console.WriteLine("failed case #6");
gary
  • 511
  • 6
  • 11
  • What language is this for? Different languages use different regex variants which support different features. – Peter Boughton Oct 31 '09 at 17:37
  • Related question: Regex: Matching by exclusion, without look-ahead - is it possible? http://stackoverflow.com/questions/466053/regex-matching-by-exclusion-without-look-ahead-is-it-possible – jfs Oct 31 '09 at 17:56
  • Test case 2 Is valid according to your rules. ,234 is a comma follwed by three digits. Case 4 is invalid: a comma followed by one digit. – beggs Nov 01 '09 at 02:55

2 Answers2

5

You can only use ^ to mean not inside of a character class (eg: [^a-b]) in most regex syntaxes.

The simplest thing for you to do would be to invert the condition in your if statement.

If you can't do that for whatever reason you can use a negative lookahead in some regex syntaxes. eg:

,(?!\d\d\d(?!\d)|\s)

In regex syntaxes that don't support negative assertions you can still do what you want, but the bigger the negative match the more complicated the regex gets. eg:

,($|[^ \d]|\d$|\d[^\d]|\d\d$|\d\d[^\d]|\d\d\d\d)

Essentially you have to enumerate all of the bad cases.

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
0

In which language are your trying to do this? This is perl-comaptible regular expression to match such case: ,(?!(\s|\d{3}[^\d])) (it will match commas not followed by space or exact 3 digits, so if string matches this regexp it is not valid)

krcko
  • 2,834
  • 1
  • 16
  • 11
  • This one matches ,233 which it should not match – Andomar Oct 31 '09 at 17:36
  • I'm using C#. Using your regex, Regex CommaError = new Regex(@",(?!(\s|\d{3}[^\d]))"); It fails test case #2 for some reason. – gary Oct 31 '09 at 17:36
  • 1
    It's failing because the `[^\d]` is saying there has to be a non-digit after the 3 digits. Since the 233 (or 234 in case #2) is at the end of the string, there is no non-digit after the 3 digits. – Laurence Gonsalves Oct 31 '09 at 18:02
  • 1
    Instead of `[^\d]` it should be another lookahead: `(?!\d)`. @Laurence, your regex should have that, too. Currently, it fails to flag a comma that's followed by four or more digits, e.g. `1,2345`. – Alan Moore Nov 01 '09 at 01:35