10

I am trying to check a text line using regex.

1,3,4,5,8,10,12,14,19,14

Here numbers are delimited with ',' and should be non-negetive and less than or equals to 20. And also any number should not be repeated. Here is my pattern.

^(?:(?:0[1-9]|[1-9]|1[0-9]|20),)*(?:0[1-9]|[1-9]|1[0-9]|20)$

But it cant check the repeat. How can I check it ?

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
Barun
  • 1,885
  • 3
  • 27
  • 47
  • 4
    Regex is not the correct tool for this. Checking for duplicates requires memory. It would be much simpler to just parse the line and check all these conditions in normal code – Paul Phillips Dec 26 '12 at 20:08
  • @PaulPhillips - Actually I know someone who did this. But it is impossible for me to ask him. I just want to know the mechanism and definitely I would not use it into real life application. – Barun Dec 26 '12 at 20:12
  • In "real world" regexes it's probably possible. In theoretical ones it isn't. – Paul Phillips Dec 26 '12 at 20:14
  • I guess it might be possible using some combination of capturing subgroups and negative lookahead, but I don't know exactly how to formulate it. It certainly wouldn't be the most efficient way. – Alex Dec 26 '12 at 20:20
  • _Why_ are you trying to do this with regex instead of real code? – Jesse Webb Dec 26 '12 at 20:30
  • @JesseWebb I just want know the secret of checking this kind of repeatation. Because I know someone who already did this using regex. – Barun Dec 26 '12 at 20:33
  • @Barun Is the amount of numbers fixed (or at least limited)? Otherwise I cannot see how this is possible. – poke Dec 26 '12 at 20:50
  • @poke Yes one line can hold atmost 20 numbers. – Barun Dec 26 '12 at 20:54

5 Answers5

7

What you want to do is not that complicated. You just need to check after each matched number if this number occurs once more in the string:

^(?:(0[1-9]|[1-9]|1[0-9]|20),(?!.*\b\1\b))*(?:0[1-9]|[1-9]|1[0-9]|20)$

See it and test it here on Regexr.

In C#:

string[] myStrings = { "1",
    "1,2",
    "01,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
    "01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20",
    "01,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,5",
    "01,02,03,04,05,06,07,08,13,09,10,11,12,13,14,15,16,17,18,19,20" };

Regex reg = new Regex(
    @"^
        (?:(0[1-9]|[1-9]|1[0-9]|20),
            (?!.*\b\1\b) # Fail if the before matched number occurs once more
        )*
        (?:0[1-9]|[1-9]|1[0-9]|20)
    $",
    RegexOptions.IgnorePatternWhitespace
);

foreach (string myString in myStrings)
    Console.WriteLine("{0} {1} a valid string.",
        myString,
        reg.IsMatch(myString) ? "is" : "is not"
    );

Console.ReadLine();
Troy Alford
  • 26,660
  • 10
  • 64
  • 82
stema
  • 90,351
  • 20
  • 107
  • 135
  • Wow just great..I wonder how could not I come with this solution earlier...It looks really simple...Great Job – Barun Dec 27 '12 at 18:54
4

As you have tagged your question with both C# and Java, I’m not going to give you a code solution here, but the basic idea.

If you split the string by ,, you get a list of substrings: "1", "3" , "4", "5", "8", "10", "12", "14", "19", "14". Now, you could just loop over those and try to parse each as an integer. If it fails, it was not a number. And if it succeeds, you can easily check if it is < 0 or > 20. And you can also keep a set of numbers you already had before and check if the current one was repeated.

The bottom line is, that you shouldn’t try to use regular expressions for everything. And your language requirement is not regular anyway (if you need to remember stuff, or count things, it is usually not regular). Perl based RegExps are capable of a bit more than just regular but it is not enough here.

Solution as regular expression

As you said in the comments, one line is limited to hold at most 20 numbers. As each number is also limited to be between zero and twenty, you have a finite amount of possibilities for how the line can actually look. As such, you have a finite language (with a finite number of possible lines). Finite languages are a subset of regular languages and as such, you can “easily” represent the language with regular expressions.

The simplest solution would be to just list every possible line. So, if you had just 3 numbers per line with 5 being the highest number (just to keep things simple), the regular expression could look like this:

0,1,2|0,1,3|0,1,4|0,1,5|0,2,3|0,2,4|0,2,5|0,3,4|0,3,5|0,4,5|1,2,3|1,2,4|1,2,5|1,3,4|1,3,5|1,4,5|2,3,4

Of course, you could simplify that a lot then (maybe even more):

0,(1,(2|3|4|5)|2,(3|4|5)|3,(4|5)|4,5)|1,(2,(3|4|5)|3,(4|5)|4,5)|2,(3,(4|5)|4,5)|3,4,5

But yeah, if you have a requirements that make a language finite, it also gets regular, but not necessarily pretty; and I would argue that the “manual” solution is still a lot more readable and especially more flexible.

poke
  • 369,085
  • 72
  • 557
  • 602
  • 2
    +1 for making me Wikipedia regex to see what "regular" actually means. – Alex Dec 26 '12 at 20:17
  • @Alex +1 for looking it up then ;D – poke Dec 26 '12 at 20:22
  • 1
    Every regex flavour that has lookahead assertions is able to do this easily and that are most modern programming languages, even JavaScript. To list every possible line is far from being the simplest solution in my opinion. – stema Dec 26 '12 at 21:58
  • @stema I know that regexp implementations can do more (I mentioned that), but didn’t know that it was this simple using the lookahead. In any way, the described language is not *regular*, hence my explanation for a finite version. – poke Dec 26 '12 at 22:12
2

Regex is not the best option for this. It gets too hairy fast for repeating numbers. You might want to look at tokenization. Even simple things like looking for a pattern that is NOT present is difficult (see Regular expression to match a line that doesn't contain a word? for an example)

I would split the string by commmas, then add them to an ordered list. If using C#:

"1,2,3,4".Split(',')

to start then continue with Linq to see if your conditions are satisfied.

If you MUST do this with regex, look at iterating over the collection search returns. But this buys you very little over the solution above.

Community
  • 1
  • 1
Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141
  • Yes I know about negetive lookahead. But it does not support more than 9 matches. – Barun Dec 26 '12 at 20:18
  • Are you forcing yourself to use regex because your list looks short enough to not warrant abstracting to collections? – Adam Dymitruk Dec 26 '12 at 20:20
  • Yes the list could be 0 to 20 any number and any amount of number. So only 9 lookahead assertion wont help me in this case. – Barun Dec 26 '12 at 20:30
  • compared to what the regex is doing underneath, you won't gain anything by avoiding straight Split on the string, parse and collection filterning. If you want terseness for the code, use Linq. – Adam Dymitruk Dec 26 '12 at 20:33
  • Actually I wont use this into any code. I just want to know how I can do this. Because I know someone who already did this using regex. – Barun Dec 26 '12 at 20:36
  • @Barun is this limit on 9 matches for negative lookahead documented somewhere? That's new to me. – Alex Dec 26 '12 at 22:52
1
String[] numbers = input.split(",");
Set<Integer> filtered = new TreeSet();

for(String number: numbers) {
   if(!number.startsWith("-") {
      int nbr = Integer.parseInt(number);

      if(nbr < 20) {
         filtered.add(nbr);
      }
   }
}
for(int nbr: filtered) {
   System.out.print(nbr + " ");
}
asgoth
  • 35,552
  • 12
  • 89
  • 98
0

Since you want regex, yes, you will be limited with back references as they only go from \1 to \9. So you need to exclude pairings. Your greatest challenge is to get rid of the repeating numbers.

from http://www.regular-expressions.info/refadv.html

use (?:(\d?\d),?)+ with (?!<regex>) to ensure you have no duplicates. You can also use (?(?=<regex>)true|false)

I used this page to experiment: http://www.regextester.com/

Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141
  • As I can see its a simple negative lookahead assertion. How is it going to solve the problem ? – Barun Dec 26 '12 at 21:20
  • Match only the numbers with no other occurrence. Ignore the second group and use \G to re-do the search from the point past the first match. – Adam Dymitruk Dec 27 '12 at 03:57
  • Looks like stema got you the full solution. I was hoping you would have gotten it with the ?! group. – Adam Dymitruk Dec 27 '12 at 11:40