1

I am working on a Python function that parses a string representing a SQL query and returns the numbers (both integers and floats). I need those to highlight the numbers in the GUI.

I use https://regex101.com/ to test the regular expressions I build and I have almost got it except when number is a part of a column name in the SQL query.

[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

would include even pop1990 in 220, pop1990 in (5, 100, 7.8,25;) which I want to avoid. It looks as I may need to use negations using ^ but not sure how it would work.

When searching for numbers, how do I exclude those cases when number comes right after a word character (\w) but still match when it comes before or after (, ,, ; and so forth?

Alex Tereshenkov
  • 3,340
  • 8
  • 36
  • 61
  • Possible duplicate of [Regex match entire words only](https://stackoverflow.com/questions/1751301/regex-match-entire-words-only) – Sebastian Proske Jan 10 '18 at 15:15
  • @SebastianProske, I am looking for a way to match numbers, not to find entire words. How is the question you posted above helpful? – Alex Tereshenkov Jan 10 '18 at 15:21
  • Well, you most likely will have to use some kind of custom word boundary using lookarounds, e.g. `(?<!\w)` and `(?!\w)` - let me look for a better dupe target -> https://stackoverflow.com/questions/14232931/custom-word-boundaries-in-regular-expression – Sebastian Proske Jan 10 '18 at 15:28

1 Answers1

1

Maybe this is what you're looking for:

https://regex101.com/r/BgEP7C/1

The regex is (?<=[^\w])[\d\.]+.

  • (?<=[^\w]) it's a positive lookbehind; it checks if what's behind the number is NOT a \w, but it doesn't add it to the match.
  • [\d\.]+ matches multiple numbers and . for the decimal point.

Updated so it works with +, - and e:

Link: https://regex101.com/r/BgEP7C/5

Regex: (?i)(?<=[^\w])[-+]?[\d\.]+(e[-+]?\d+)?

REQUESTED EDIT

The original version doesn't work in Python beacause of the (e[-+]?\d+)? part.

A version that workds in Python: (?i)(?<=[^\w])[-+]?[\d\.]+e?[+-]?\d*

francisco sollima
  • 7,952
  • 4
  • 22
  • 38
  • Thanks, the first one works. Interestingly enough, the second expression returns nothing in Python 3.5. Try `import re re.findall("(?i)(?<=[^\w])[-+]?[\d\.]+(e[-+]?\d+)?", 'pop1990 in (5,100,5)')`, getting `['', '', '']` back – Alex Tereshenkov Jan 10 '18 at 15:57
  • You're right. Apparently the problem is this part `(e[-+]?\d+)?`. `(?i)(?<=[^\w])[-+]?[\d\.]+e?[+-]?\d*` works. You want me to add it to the answer? – francisco sollima Jan 10 '18 at 16:02
  • Try `re.findall("(?i)(?<=[^\w])[-+]?[\d\.]+e?[+-]?\d*", 'pop1990 in (5,100,5)')` – francisco sollima Jan 10 '18 at 16:02