-1

I'm looking for a regular expression to detect invalid floating point numbers in the sense that they cannot have two decimal points. Here is what I have, but it's not working:

REAL = re.compile("^\d+\.\d+$")

Edit: I'm using python. In the big picture I'm writing a lexer to recognize a miniature version of the C syntax. A 2.3.4 is recognized as invalid, but a 13.4.5 is not. It has something to do with that, I suppose.


Sorry for the poorly formatted question. After reading through some comments I found the error elsewhere in the code. Turns out that using re.compile("0") and re.compile("1") earlier in my code was causing any sequence starting with a 1/0 to be 'picked up' as valid, despite whatever the remainder of the sequence. Simply adding "0$" and "1$" fixed my problem.

Clev3r
  • 1,568
  • 1
  • 15
  • 28
  • 1
    WHAT'S THE LANGUAGE ? Please read the description of the tag you use. – Denys Séguret Mar 22 '13 at 20:25
  • 1
    @dystroy this looks like python – John Dvorak Mar 22 '13 at 20:26
  • That looks correct. On what strings is it working/not working? – Reinstate Monica -- notmaynard Mar 22 '13 at 20:29
  • 2
    @Clever, some more context, please. *What* exactly is not working? Some input and expected output would help. How are you running your test(s)? – Bart Kiers Mar 22 '13 at 20:29
  • That is, it matches valid floats. Are you trying to match invalid floats? – Reinstate Monica -- notmaynard Mar 22 '13 at 20:33
  • "They cannot have two decimal places" Are you trying to match floats that have this property, floats that don't have this property, or everything that isn't a float with two decimal places? – Asad Saeeduddin Mar 22 '13 at 20:34
  • I'm trying to find any invalid floats and point them out as a syntax error. I meant two decimal points, not two decimal places*, oops. – Clev3r Mar 22 '13 at 20:35
  • When I run it, it rejects both `1.3.4` and `13.4.5`. – Reinstate Monica -- notmaynard Mar 22 '13 at 20:38
  • 2
    What does "not working" mean? If I take your code, and then do `REAL.match('13.4.5')`, it returns `None`, exactly as it does for `1.3.4`. My guess is that there's a problem somewhere _else_ in your lexer that's causing, e.g., the `13.4` to get passed to this regex, and `.5` as another token. But without knowing anything else about your code, there's no way to guess what causes that. – abarnert Mar 22 '13 at 20:38
  • Can you show us a full, working bit of code that demonstrate what's going wrong? – Reinstate Monica -- notmaynard Mar 22 '13 at 20:39
  • @abarnert Thanks for pointing this out. I was earlier detecting for true/false as '0' or '1' but failed to include a $ to terminate the sequence, meaning any float starting with a 0 or 1 and continuing with two decimal points would get 'picked up' earlier. – Clev3r Mar 22 '13 at 20:41
  • @Clever: That's exactly why you need to give a [SSCCE](http://sscce.org) instead of giving a random bit of code that may or may not be whether the problem lies, and can't be tested or debugged. – abarnert Mar 22 '13 at 20:54
  • As a side note, trying to build a lexer out of regexps for a language with the ambiguous lexical structure like C is probably not the best approach. – abarnert Mar 22 '13 at 20:59
  • related: [Python and regex question, extract float/double value](http://stackoverflow.com/questions/385558/python-and-regex-question-extract-float-double-value) – jfs Mar 23 '13 at 06:39

2 Answers2

2

A simpler way would be doing this:

floatStr = '12.3.4'
try:
    float(floatStr)
except ValueError:
    # do something
    pass

In other words: try to parse the string, and if it fails, it's because the format is not that of a floating point number. No need to mess around with regular expressions here (the format of a valid floating-point number can be a bit tricky to get right) - just let the standard conversion function do the heavy lifting for you!

Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • That doesn't solve his problem. It passes floats with two decimal places, like `12.34`, which is what he specifically said he wants to stop. – abarnert Mar 22 '13 at 20:35
  • 1
    @abarnert I think he meant two "decimal places" as in two `.`s. – arshajii Mar 22 '13 at 20:36
  • @abarnert see the other comment, A.R.S. is right. – Óscar López Mar 22 '13 at 20:36
  • 2
    @A.R.S.: Yes, it looks like he's changed his question, first to give an example `13.4.5`, and then to change "decimal place" to "decimal point". So, this is answering what he _meant_ to ask, which is even better than answering what he actually asked. :) – abarnert Mar 22 '13 at 20:37
  • 1
    I still wouldn't use the variable name `str`.. and add a `pass` within the `except` scope. am i too obsessive? – kirpit Mar 22 '13 at 20:48
  • @kirpit no, they're fine suggestions. I updated my answer with them. – Óscar López Mar 22 '13 at 20:51
1

Your problem is not actually in this code at all.

As a quick test shows, with REAL = re.compile("^\d+\.\d+$"), re.match('13.4.5') returns None, just as re.match('2.3.4').

The problem must be that some earlier code is matching '13.4.5' in some way that causes it to either eat the rest of the token, or eat enough of it that what remains (e.g., '4.5') is a valid float. Without seeing your code, nobody can guess what exactly the problem is.

But, as it turns out, you've got another regex that matches '1' without a terminator, so whatever code you have to builds lexemes out of regex matches accepts all of '13.4.5'. Again, without seeing your code, nobody can guess why exactly that happens…

abarnert
  • 354,177
  • 51
  • 601
  • 671