I'm trying to match a comma-separated string of numbers to a certain pattern within an sql query. I used regular expressions for similar problems in the past successfully, so I'm trying to get them working here as well. The problem is as follows:
- The string may contain any number in a range (e.g. 1-4) exactly 0-1 times.
- Two numbers are comma-separated
- The numbers have to be in ascending order
(I think this is kind of a case of ordered sampling without replacement)
Sticking with the example of 1-4, the following entries should match:
1
1,2
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
3
3,4
4
and these should not:
q dawda 323123 a3 a1 1aa,1234 4321 a4,32,1a 1112222334411
1,,2,33,444, 11,12,a 234 2,2,3 33 3,3,3 3,34 34 123 1,4,4,4a 1,444
The best try I currently have is:
\b[1-4][\,]?[2-4]?[\,]?[3-4]?[\,]?[4]?\b
This still has two major drawbacks:
- It delivers quite a lot of false positives. Numbers are not eliminated after they occurred once.
- It will get rather long, when the range of numbers increases, e.g. 1-18 is already possible as well, bigger ranges are thinkable of.
I used regexpal for testing purposes.
Side notes:
- As I'm using sql it would be possible to implement some algorithm in another language to generate all the possible combinations and save them in a table that can be used for joining, see e.g. How to get all possible combinations of a list’s elements?. I would like to only rely on that as a last resort, as the creation of new tables will be involved and these will contain a lot of entries.
- The resulting sql statement that uses the regex should run on both Postgres and Oracle.
- The set of positive examples is also referred to as "powerset".
Edit: Clarified the list of positive examples