Looking at some of the regex questions commonly asked on SO, it seems to me there's a number of areas where the traditional regex syntax is falling short of the kind of tasks people are looking for it to do nowadays. For instance:
- I want to match a number between 1 and 31, how do I do that ?
The usual answer is don't use regex for this, use normal conditional comparisons. That's fine if you've got just the number by itself, but not so great when you want to match the number as part of a longer string. Why can't we write something like \d{1~31}
, and either modify the regex to do some form of counting or have the regex engine internally translate it into [1-9]|[12]\d|3[01]
?
- How do I match an even/odd number of occurrences of a specific string ?
This results in a very messy regex, it would be great to be able to just do (mytext){Odd}
.
- How do I parse XML with regex ?
We all know that's a bad idea, but this and similar tasks would be easier if the [^ ]
operator wasn't limited to just a single character. It'd be nice to be able to do <name>(.*)[^(</name>)]
- How do I validate an email with regex ?
Very commonly done and yet very complex to do correctly with regex. It'd save everyone having to re-invent the wheel if a syntax like {IsEmail}
could be used instead.
I'm sure there are others that would be useful too. I don't know too much about regex internals to know how easy these would be too implement, or if it would even be possible. Implementing some form of counting (to solve the first two problems) may mean it's not technically a 'regular expression' anymore, but it sure would be useful.
Is a 'regex 2.0' syntax desirable, technically possible, and is there anyone working on anything like this ?