Regex to identify all sorts of candidate legal numbers

Question

[This is a heavily re-edited version. Please ignore past versions of this question.]

A small python script using a sophisticated regex was provided by eyquem to identify numbers in a string and sanitize them. The test results cover over 50 samples, which I won't repeat here.

The question is, can someone adjust that regexp or provide a new one so that commas are treated more sanely?

In particular, I would like to see the following 4 test inputs produce the associated outputs.

' 4,8.3,5 ' -> '4' '8.3' '5'
' 44,22,333,888 ' -> '44' '22,333,888' #### Note that 44,22 is never a single number.
' 11,333e22,444 ' -> '11,333e22' '444' #### 11,333 is accepted in front of e22, but 22,444 is not accepted after it.
' 1,999 people found the code "i+=1999;" to be crystal clear in meaning and to likely lead to less than 1999 kilobytes extra memory consumption; however, the gains in 1, 999, and 1999 KB disk space are anything but ideal, especially this being 1999 and us having over $1,999 to work with! ' -> '1,999' '1999' '1999' '1' '999' '1999' '1999' '1,999'

some inspiration can be found here http://regexlib.com/DisplayPatterns.aspx?cattabindex=2&categoryId=3 — Fredrik Pihl, May 10 '11 at 10:05
@Fredrik, thanks for that link. It's a useful resource, but I wish there was a better way to search through there, eg, by typing in your inputs and desired outputs and then the search engine would identify if any of the submitted regex fulfill your criteria. — Jose_X, May 10 '11 at 13:41

score 0 · Answer 1 · answered May 10 '11 at 05:45

0

Despite all the information, your post is actually vague. For starters, you didn't ask any questions. What is it you want?

Are you asking how to find all possible matches? In Perl, you can use

local our @matches;
/(...)(?{ push @matches, $1 })(?!)/

The (?!) never matches, so it causes the regex engine to backtrack to find another match, but the code block saves what it did find before doing that.

If you're asking to find any match, then it's quite easy to solve: Don't bother looking for option 2, because option 1 will always match what option 2 matches.

answered May 10 '11 at 05:45

ikegami

367,544
15
269
518

I edited the question heavily to make it clearer. Do you now understand what it asks? – Jose_X May 10 '11 at 07:47
I edited the question yet again. It's now much more to the point and the requirements were adjusted so that the regex would be more useful. – Jose_X May 10 '11 at 13:35

Regex to identify all sorts of candidate legal numbers

1 Answers1