1

what's the right regular expression to find a numeral in a string, in Lua? Due to the way parentheses are used in lua regular expressions, it seems hard to correctly match the decimal point and the digits after it.

The workaround in the test code below works for my script's immediate needs, but accepts patterns like +1.23.45 as well.

--[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?  std regex for a numeral

s = "+1.23"
re = "([+-]?%d+[%.%d+]*)"
n = s:match (re)
print (n)
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • How about this? `'([+-]?%d*%.?%d+)'` – tonypdmtr Jun 20 '19 at 20:53
  • But if you have `+1.23.45`, what is the expected result? Please let know of all possible edge cases you need to handle. Do you need to validate the whole string input, or are you extracting several possible matches from a large string? – Wiktor Stribiżew Jun 20 '19 at 21:47
  • Thank you, this is safer. Just a little astro (analemma) math for a friend, thought I'd learn Lua along with it this time. – Rahul Choudhary Jun 21 '19 at 05:36

1 Answers1

1

If you insist on a loose definition of a numeric value like the one shown in the regular regex we are in trouble since lua-patterns do not support the alternation operation |.

The suggested pattern ([+-]?%d*%.?%d+) works actually for most cases, however, if you also want to allow cases like 42. (as the PCRE does) it will fail.

We could try to use parenthesis and an optional extra dot that will fall off in case like this: ([+-]?%d*%.?%d+)%.? This comes close but removes the final dot if not followed by a digit and therefore returns false positives like .12. as .12. *

*(Though, effectively it's the same as your RE \[+-\]?(\d+(\.\d+)?|\.\d+) without the exponential part..
 In case I would prefer a more complete RE like this: ^[+-]?((\d+(\.\d*)?)|(\.\d+))$)

Demo code:

re = "^([+-]?%d*%.?%d+)%.?$"
v = {'123', '23.45', '.45', '-123', '-273.15', '-.45', '+516', '+9.8', '+.5', -- regular matches
     '34.', '+2.', '-42.', --only matched by prolematic last optional dot
     '.', '-.', '+.', ' ', '', --expected no matches
     '.12.', '+.3.', '-.1.', --false positives (strictly speaking)
     '+1.23.45' -- no matches
}
for i, v in ipairs(v) do
    n = v:match (re)
    print (n)
end

I think the first suggested option is acceptable. If even the second version still doesn't cut it I would suggest trying lrexlib, a multi-flavor regex library, or LPeg, a powerful text parsing library for Lua.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Thank you, found the big RE in the wiki while revising. One would think this has to be already builtin to the Lua base since it handles numbers of all kinds, and written of upfront, with it's roots in processing text configuration files. Guess all the industrial uses must have one or the other add-on text parsing libs along with many more – Rahul Choudhary Jun 21 '19 at 05:47
  • Lua alone does not do much, that's true. Here are some [Lpeg Recipes](http://lua-users.org/wiki/LpegRecipes) that show how to match number patterns like requested, and here is a helpful comparison of [LPEG and regex'](http://www.gammon.com.au/lpeg). – wp78de Jun 24 '19 at 20:10
  • does a Lot actually for most needs, including this to ID a standard numeral, in <500slocs what other regex libs do in ~4000slocs+. is there any little lib, that adds in the alternate operator and any other features that Robert may have pruned, bit like grep & egrep? Regex libraries are a task in themselves to grasp&compare for mem&speed and such. Thank you all for the recipes – Rahul Choudhary Jul 10 '19 at 01:03