7

I need to know if there is a regular expression for testing for the presence of numbers in strings that:

  • Matches Lorem 20 Ipsum
  • Matches Lorem 2,5 Ipsum
  • Matches Lorem 20.5 Ipsum
  • Does not match Lorem 2% Ipsum
  • Does not match Lorem 20.5% Ipsum
  • Does not match Lorem 20,5% Ipsum
  • Does not match Lorem 2 percent Ipsum
  • Does not match Lorem 20.5 percent Ipsum
  • Does not match Lorem 20,5 percent Ipsum
  • Matches Lorem 20 Ipsum 2% dolor
  • Matches Lorem 2,5 Ipsum 20.5% dolor
  • Matches Lorem 20.5 Ipsum 20,5% dolor

That is, a regular expression that can tell me if in a string there is one or many numbers, but not as percentage value.

I've tried something as /[0-9\.,]+[^%]/, but this not seems to work, I think because digits then not a percentage sign match also the 20 in the string 20%. Additionally, I don't know how to tell not the entire percent string in addition to the % char.

lorenzo-s
  • 16,603
  • 15
  • 54
  • 86

3 Answers3

13

This will do what you need:

\b                     -- word boundary
\d+                    -- one or more digits
(?:\.\d+)?             -- optionally followed by a period and one or more digits
\b                     -- word boundary
\s+                    -- one or more spaces
(?!%|percent)          -- NOT followed by a % or the word 'percent'

--EDIT--

The meat here is the use of a "negative lookahead" on the final line that causes the match to fail if any of a percent-sign or the literal "percent" occurs after a number and one or more spaces. Other uses of negative lookahead in JavaScript RegExps can be found at Negative lookahead Regular Expression

--2ND EDIT-- Congrats to Enrico for solving the most general case but while his solution below is correct, it contains several extraneous operators. Here is the most succinct solution.

(                         -- start capture
  \d+                     -- one or more digits
  (?:[\.,]\d+)?           -- optional period or comma followed by one or more digits
  \b                      -- word boundary
  (?!                     -- start negative lookahead
    (?:[\.,]\d+)          -- must not be followed by period or comma plus digits
  |                       --    or
    (?:                   -- start option group
      \s?%                -- optional space plus percent sign
    |                     --   or
      \spercent           -- required space and literal 'percent'
    )                     -- end option group
  )                       -- end negative lookahead
)                         -- end capture group
Community
  • 1
  • 1
Rob Raisch
  • 17,040
  • 4
  • 48
  • 58
  • You miss the , as decimal separator. (Just a tip) – Javier Diaz Nov 03 '12 at 19:48
  • Thanks for the comment, but the OP did not ask to extract numbers nor how many might occur in a string, only that they exist. My answer will match strings that contain numbers which are not percentages so matching the comma is immaterial to the OP's requirement. Thanks again. – Rob Raisch Nov 03 '12 at 19:52
  • This regex matches 20.5% because of the optionality of the decimal part – enrico.bacis Nov 03 '12 at 19:56
  • This regex works perfectly for the example that you gave us, but it doesn't get the `,` as a separator, just change the `(?:\.\d+)?` for `(?:[\.,]\d+)?` – Javier Diaz Nov 03 '12 at 20:00
  • OMG there's so much I do not know about regexp. Thank you Rob, thank you @JavierDiaz – lorenzo-s Nov 03 '12 at 20:03
  • I think the whitespaces shouldn't be required - and also they should be inside the lookahead – Bergi Nov 03 '12 at 20:07
  • 1
    @enrico, I do not believe you are correct. Due to the requirement that the number be followed by one or more spaces, 20.5% will not match. – Rob Raisch Nov 03 '12 at 20:07
  • 1
    @rob, Yes, like that will not match, but the fact that a number must be followed by a space to be a number is wrong. This will not match numbers at the end of the string or numbers followed by anything that is not a space. For example the sentences "I have to pay 12.5$" or "The number is 5." contain numbers that are not percentages. So your regex works correctly for all the test cases that lorenzo provided, but will not work always. – enrico.bacis Nov 03 '12 at 20:18
  • @enrico.bacis You are right. However, your regexp match for any string that contains any type of number. @RobRaisch I can't get how edit `/\b\d+(?:[\.,]\d+)?\b\s+(?!%|percent)/` to correct this bug Enrico reported. – lorenzo-s Nov 03 '12 at 20:23
7

This is the robust way to do it, and it's also extracting the numbers.

(\b\d+(?:[\.,]\d+)?\b(?!(?:[\.,]\d+)|(?:\s*(?:%|percent))))

It is similar to Rob's regex but it should work for all the cases.

(                          -- capturing block
  \b                       -- word boundary
  \d+                      -- one or more digits
  (?:[\.,]\d+)?            -- optionally followed by a period or a comma
                              and one or more digits
  \b                       -- word boundary
  (?!                      -- not followed by
    (?:[\.,]\d+)           -- a period or a comma and one or more digits
                              [that is the trick]
    |                      -- or
    (?:\s*(?:%|percent))   -- zero or more spaces and the % sign or 'percent'
  )
)
enrico.bacis
  • 30,497
  • 10
  • 86
  • 115
0

Use negative lookahead instead of your negated character class:

/\d+(?:[,.]\d+)?(?!\s*(?:percent|%))/
Bergi
  • 630,263
  • 148
  • 957
  • 1,375