1

This question is continuation of this question. The problem is that regex "[-+]?\\d*\\.?\\d+([eE][-+]?\\d+)?" doesn't correctly find doubles.

For instance, input sdf9.99e.23 contains no doubles, cause if we have [eE], after it MUST be a [+-] or just [0-9].

So I need some kind of "if" in the regex. In pseudo-code it'll be like this: if(char[i]==(e|E)) then if(char[i+1] == ('+'|'-')) else return null.

Community
  • 1
  • 1
Helgus
  • 177
  • 3
  • 7
  • 17
  • 7
    So would `sdf9.99f.23` contain a double (or two)? I'd argue that `sdf9.99e.23` contains *two* double values: `9.99` and `.23`. The whole problem of parsing unstructured text is that it's simply *far too open to interpretation*. Unless you have a **very specific definition** you can *always* find a case that could be argued over. – Joachim Sauer Mar 01 '12 at 07:54
  • It will just find a double with value `9.99`, the dot after `e` is not accounted for in the regex. – Nishant Mar 01 '12 at 07:56
  • @JoachimSauer `sdf9.99e.23` contains no doubles. the behaviour of parser should be next: we found 9.99, but then there is "e"(or "E"), so it means that after [eE] must be [+-] or a digit. but in this case we have dot and this number shouldn't be parsed anymore. and if there is no [eE], it should parse doubles in a normal way. – Helgus Mar 01 '12 at 08:21
  • 3
    @Helgus: but *why* is `e` so special that it "*breaks*" the string. Why doesn't `f` do the same? Or an empty space? Why do you choose to *ignore* any other malformed non-number, but *that specific case* should cause your algorithm to return an error? In `1a` you find `1`, in `1b` you find `1`, in `1c` you find `1`, in `1d` you find `1`, but in `1e` you return an error. *Why* is that? – Joachim Sauer Mar 01 '12 at 08:38
  • @JoachimSauer cause [eE] specifies, that after it there will be a degree. Example of correct double to be parsed: 9.99e23 => the resulting double value will be 9.99 * 10^23 (where ^ = degree). i think i explained clear enough. – Helgus Mar 01 '12 at 09:05
  • 1
    What about `e1e2e3e4e5e` Does this have five, three, two or no doubles? – Peter Lawrey Mar 01 '12 at 09:06
  • 1
    @Helgus but `e` could also be a plain `e` – Peter Lawrey Mar 01 '12 at 09:07
  • @PeterLawrey first and last e shouldn't be considered(cause we don't have digits before/after [eE], then we have `1e2e3e4e5` e i think it'll like this. but need a solution for my concrete example, which will cover the previous solution. name it "extended" or however you want – Helgus Mar 01 '12 at 09:15
  • Suggestions? Provide exact defintion of what should be considered as double. Regex provided is working ok: it founds a match. – kirilloid Mar 01 '12 at 21:29
  • @kirilloid once more: if in string we have a digit AND after it there is `[eE]`, after `[eE]` MUST be (`[+-]` and digit) or digit. otherwise, there are no doubles. And if we have a digit and after it there are no `[eE]`, it should parse double in a normal way(like it parses now). – Helgus Mar 02 '12 at 08:07
  • According to this "if in string we have a digit AND after it there is [eE], after [eE] MUST be ([+-] and digit) or digit. otherwise, there are no doubles." there're doubles there: first "9" is good double, since you have said nothing about periods(dots). – kirilloid Mar 02 '12 at 08:43
  • @kirilloid : don't cavil at every word! i just need an "extension" to the existing regex, read carefully all messages! just need an regex, which will correctly(i.e. returns no doubles) react on a string like this `9.99e.23` and correctly reacts on doubles as it reacts now – Helgus Mar 02 '12 at 08:48
  • 1
    This is not an exact definition. Unfortunately for you, your "correctly" (according to comments) diverges with others' ones. Therefore you need to provide **exact** definition. "string like this 9.99e.23 is wrong" isn't full and exact definition. Sorry, but if I (or others) will not cavil at every word, you'll get inexact/wrong algoritm or will not get at all. – kirilloid Mar 02 '12 at 08:55
  • @kirilloid "string like this 9.99e.23 is wrong" is just an example. the **exact definition** i wrote before. [eE] specifies, that after it there will be a degree. Example of correct double to be parsed: 9.99e23 => the resulting double value will be 9.99 * 10^23 (where ^ = degree). i think i explained clear enough. – Helgus Mar 02 '12 at 09:06
  • @Helgus You didn't answer the question "how many doubles are in `e1e2e3e4e5e`". I'm also curious why `sdf9.99e.23` contain no double, I accept (but do not understand) that 9.99 is not valid double for you (as you wrote earlier, there are no digits or +/- after `e`), but `.23` is still correct double. – Betlista Mar 23 '12 at 09:21

1 Answers1

0

Using telepathy(anticipation) to extend your algorithm and disallow digits, dot, non-digit as number I may suggest these 3 regex's. Use them consecutively on the same string and unite (concat, append) results.

"[+-]?\\d+((?![\d.])|$)" // ±digits w/o dot after them (actually, this is integer)
"[+-]?\\d+\\.\\d+((?![\deE])|$)" // ±digits, dot, digit w/o [eE] after them
"[+-]?\\d+\\.\\d+[eE][+-]?\\d+" // full variant: ±digits, dot, digits, "e", ±digits

I tried some approach for combining that into one regexp, but unfortunately it doesn't work.

kirilloid
  • 14,011
  • 6
  • 38
  • 52