3

I want all the proper natural numbers from a given string,

var a = "@1234abc 12 34 5 67 sta5ck over @ numbrs ."
numbers = a.match(/d+/gi)

in the above string I should only match the numbers 12, 34, 5, 67, not 1234 from the first word 5 etc..

so numbers should be equal to [12,34,5,67]

syllogismos
  • 600
  • 2
  • 15
  • 39

2 Answers2

7

Use word boundaries,

> var a = "@1234abc 12 34 5 67 sta5ck over @ numbrs ."
undefined
> numbers = a.match(/\b\d+\b/g)
[ '12', '34', '5', '67' ]

Explanation:

  • \b Word boundary which matches between a word charcter(\w) and a non-word charcter(\W).
  • \d+ One or more numbers.
  • \b Word boundary which matches between a word charcter and a non-word charcter.

OR

> var myString = '@1234abc 12 34 5 67 sta5ck over @ numbrs .';
undefined
> var myRegEx = /(?:^| )(\d+)(?= |$)/g;
undefined
> function getMatches(string, regex, index) {
...     index || (index = 1); // default to the first capturing group
...     var matches = [];
...     var match;
...     while (match = regex.exec(string)) {
.....         matches.push(match[index]);
.....     }
...     return matches;
... }
undefined
> var matches = getMatches(myString, myRegEx, 1);
undefined
> matches
[ '12', '34', '5', '67' ]

Code stolen from here.

Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • oh \b represents all the spaces? – syllogismos Aug 03 '14 at 18:49
  • 1
    Is the first `\b` needed? – j08691 Aug 03 '14 at 18:50
  • 2
    @syllogismos no, it is a zero-width match between word and non-word characters (in any order), see http://www.regular-expressions.info/wordboundaries.html. @j08691 - it is needed as long as you don't want to match `123` from `something123`. – Aprillion Aug 03 '14 at 18:52
  • @AvinashRaj surely you meant "other than `[A-Za-z0-9_]`" – Aprillion Aug 03 '14 at 18:57
  • @j08691 -- it works on `var a` without the first `\b` doesn't it? but if you change `a` to `"@1234 12 34 5 67 sta5ck over @ numbrs ."` then it won't work without first `\b` – yitwail Aug 03 '14 at 18:58
  • 1
    I don't think that there's any need for a case-insensitive match here. – Tom Fenech Aug 03 '14 at 18:59
  • 1
    @yitwail it won't work on `"@1234"` in any case, in the sense that it will match `1234` from that string regardless of first `\b` because there is a word boundary between `@1`. but javascript doesn't support lookbehind, so a nasty hack would be necessary to exclude that case – Aprillion Aug 03 '14 at 19:00
  • @Aprillion \w stands for "word character". It always matches the ASCII characters `[A-Za-z0-9_]`. Source http://www.regular-expressions.info/shorthand.html – Avinash Raj Aug 03 '14 at 19:01
  • @AvinashRaj my point exactly. you said something completely different which would make this answer incorrect if your statement were true (you implied that `[0-9]` are non-word characters) – Aprillion Aug 03 '14 at 19:02
  • @Aprillion sorry i forget to include `0-9` . – Avinash Raj Aug 03 '14 at 19:05
  • @Aprillion -- right you are. I should have tested before adding my comment. – yitwail Aug 03 '14 at 19:14
  • getMatches function doesnt match 1234 in `@1234` where as the first solution matches 1234 in `@1234` – syllogismos Aug 03 '14 at 20:05
  • yes, because @ is a non-word charcter, word boundary exits between the `@` and the number 1. So `\b\d+\b` match 1234 in `@1234` but the second one matches the numbers only if the preceding character must be a line start or space and the following character must be a line end or space. I think now you understand the differences. – Avinash Raj Aug 03 '14 at 20:11
  • when I directly put the myRegex in the String.match function it is matching an extra space along with the number, but the function getMatches doesn't do that – syllogismos Aug 03 '14 at 20:15
  • Yes .get matches func was defined to print the first captured group only. – Avinash Raj Aug 03 '14 at 20:20
2

If anyone is interested in a proper regex solution to match digits surrounded by space characters, it is simple for languages that support lookbehind (like Perl and Python, but not JavaScript at the time of writing):

(?<=^|\s)\d+(?=\s|$)

Regular expression visualization Debuggex PCRE Demo

As illustrated in the accepted answer, in languages that don't support lookbehind, it is necessary to use a hack, e.g. to include the 1st space in the match, while keepting the important stuff in a capturing group:

(?:^|\s)(\d+)(?=\s|$)

Regular expression visualization Debuggex JavaScript Demo

Then you just need to extract that capturing group from the matches, see e.g. this answer to How do you access the matched groups in a JavaScript regular expression?

Community
  • 1
  • 1
Aprillion
  • 21,510
  • 5
  • 55
  • 89