1

I'm scratching my head trying to come up with a regex that extracts numbers from strings that are differently formatted. For example:

'1', '1.1', '1,1', '1,000,000.20', '1.00000020', '1.000.000,20', '10.20001'

I currently use the regex [-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)? and it works well in the majority of the cases except from 1,000,000.20 and 1.000.000,20.

Do you have any idea how can I tweak the previous regex to work with those examples?

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
joanfihu
  • 336
  • 2
  • 9
  • 1
    In your case, you may just capture what is inside single quotes, `'([^']+)'`. – Wiktor Stribiżew Jan 29 '18 at 10:56
  • https://stackoverflow.com/questions/5917082/regular-expression-to-match-numbers-with-or-without-commas-and-decimals-in-text might be useful. In particular you can use "Commas optional as long as they're consistent" twice (with ',' and '.' exchanged). – Micha Wiedenmann Jan 29 '18 at 10:58
  • Are you using a programming language here? – Tim Biegeleisen Jan 29 '18 at 10:59
  • 1
    @joanfihu Note that if your texts are clean and numbers are not following each other (after a comma or dot), you might try something like [`\d[\d.,]*(?:[eE][-+]?\d+)?`](https://regex101.com/r/wUPLoo/1). Enclose with word boundaries if necessary. – Wiktor Stribiżew Jan 29 '18 at 11:14

1 Answers1

1
(?!\d+,\d+\.\d+,|\d+\.\d+,\d+.)^([+-]?(?:\d+|\d{1,3}(?:[.,]\d{3})*)(?:[.,]\d+|[eE][+-]?(?:\d+|\d{1,3}(?:[.,]\d{3})*))?)$

Perhaps something like this?

This will match all of the ones you stated, plus numbers written in the format 1e10 and 1e-9.

It will also not match numbers where there are inconsistencies in the comma dot format, i.e 10.234245,214,10.234,245.214 or 10,234.245,214

Also will allow for + or - at the beginning of these numbers

Check it out on Regex101

KyleFairns
  • 2,947
  • 1
  • 15
  • 35