1

I would like to extract numbers from a string such as

There are 1,000 people in those 3 towns.

and get an array like ["1,000", "3"].

I got the following number matching Regex from Justin in this question

^[+-]?(\d*|\d{1,3}(,\d{3})*)(\.\d+)?\b$

This works great for checking if it is a number but to make it work on a sentence you need to remove the "^" and "$".

regex101 with start/end defined regex101 without start/end defined

Without the start and end defined you get a bunch of 0 length matches these can easily be discarded but it also now splits any numbers with a comma in them.

How do I make that regex (or a new regex) work on sentences and still find numbers with commas in them.

A bonus would be not having all the 0 length matches as well.

Community
  • 1
  • 1
Sam Dean
  • 433
  • 7
  • 22

3 Answers3

4

The expression /-?\d(?:[,\d]*\.\d+|[,\d]*)/g should do it, if you're okay with allowing different groups such as 1,00,000 (which isn't unknown in some locales). I feel like I should be able to simplify that further, but when I try the example "333.33" gets broken up into "333" and "33" as separate numbers. With the above it's kept together.

Live Example:

const str = "There are 10,000 people in those 3 towns. That's 3,333.33 people per town, roughly. Which is about -67.33 from last year.";
const rex = /-?\d(?:[,\d]*\.\d+|[,\d]*)/g;
let match;
while ((match = rex.exec(str)) !== null) {
    console.log(match[0]);
}

Breaking /\d(?:[,\d]*\.\d+|[,\d]*)/g down:

  • -? - an optional minus sign (thank you to x15 for flagging that up in his/her answer!)
  • \d - a digit
  • (?:...|...) - a non-capturing group containing an alternation between
    • [,\d]*\.\d+ - zero or more commas and digits followed by a . and one or more digits, e.g. 3,333.33; or
    • [,\d]* - zero or more commas and digits

The first alternative will match greedily, falling back to the second alternative if there's no decimal point.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    Thanks very much! I added `(\.\d+)?` to the end of it to allow for decimals. – Sam Dean Nov 01 '19 at 17:24
  • 1
    @SamDean - Ah, good point! But I don't think that's going to be sufficient, one sec. – T.J. Crowder Nov 01 '19 at 17:26
  • 1
    @SamDean - I've updated it to handle fractional numbers, sorry I missed that. It's more complicated than just adding `(\.\d+)?` after it. :-) – T.J. Crowder Nov 01 '19 at 17:37
  • @SamDean - Um...we didn't handle minus signs. :-) So probably want a `-?` on the front of that. (A `[+-]?` if you want to allow unary `+` as well.) – T.J. Crowder Nov 01 '19 at 18:35
1

One alternate approach is to split with space and see if the value can be parsed to a number,

let numberExtractor = str => str.split(/\s+/)
                                .filter(v => v && parseFloat(v.replace(/[.,]/g, '')))


console.log(numberExtractor('There are 1,000 people in those 3 towns. some more numbers -23.012 1,00,000,00'))
Code Maniac
  • 37,143
  • 5
  • 39
  • 60
0

To match integer and decimal numbers where the whole part can have optional
comma's that are between numbers but not in the decimal part is done like this:

/[+-]?(?:(?:\d(?:,(?=\d))?)+(?:\.\d*)?|\.\d+)/

https://regex101.com/r/yOuBPx/1

The input sample does not reflect all the boundary conditions this regex handles.
Best to experiment to see it's full effect.

  • When posting an answer well after an accepted one, it's useful to point out what it handles that the previous one doesn't, so people can see that more easily and recognize the benefit of the additional answer. (If there isn't anything or it's not significant, then not posting at all is probably best, small things can be comments on answers.) – T.J. Crowder Nov 01 '19 at 18:34
  • @TJ.Crowder - Just stated what it does and that the regex should be explored for all the boundary effects. Significantly different than the accepted answer. –  Nov 01 '19 at 19:47
  • 1
    @Thefourthbird - It could be the trailing dot would distinguish it as a float. But, given the comma's can be in non-thousands places, it's a jumbled spec. –  Nov 01 '19 at 19:48