Background: I'm trying to incrementally parse expressions like "cos(1.2)". Now, to the actual question (note: that the actual question is mostly in the next paragraph; the rest is ramblings about solutions that seem to almost work):
Suppose I have a String in Java which might start with a floating point number, and then has some more "stuff" after it. For instance, I might have 52hi (which starts with "52", and ends with "hi"), or -1.2e1e9 (which starts with "-1.2e1", also known as "negative twelve" and ends with "e9"). I want to parse this number into a double.
It's tempting to use Double.parseDouble, but this method expects the string as a whole to be a valid number, and throws an exception if not. The obvious thing to do is write a regular expression to separate out the number from the other stuff, and then use parseDouble.
If I was parsing integers, this wouldn't be too bad, something like -?[0-9]+
. (Even then, it's easy to forget an edge case and now your users are not able to enter +9 for symmetry with -9. So the preceding regex should have been [-+]?[0-9]+
.) But for floats it's complicated; maybe something like this (ignore the fact that "." is not taken literally by default in most regex dialects):
[-+]?[0-9]*.?[0-9]*(e[-+]?[0-9]+)?
.
Except we just said that an empty string is a valid number. And so is ".e2". So probably something a bit more complicated. Or maybe I could have a "sloppy" regex like above that allows some non-numbers as long as it doesn't forbid any actual numbers. But at some point I start thinking to myself "isn't this supposed to be parseDouble's job?". It's doing most of the work needed to find out where in the string the number ends and other stuff begins, because otherwise it wouldn't be able to throw the exception. Why should I have to do it as well?
So I started looking to see whether there was anything else in the Java standard library that could help. My usual tool of choice is java.util.Scanner, which has a nice nextDouble() method. But Scanner works on "tokens", so nextDouble really means "get the next token and try to parse it as a double". Tokens are separated by delimiters, which my default is whitespace. So Scanner would have no trouble with "52 hi", but wouldn't work with "52hi". In theory, the delimiter can be any regular expression I choose, so all I have to do is concoct a regular expression that, when it matches, signifies the end of a number. But this seems even harder to do than directly writing a regular expression.
I was about to give up hope when I found java.text.DecimalFormat, which explicitly says "I'll parse as far as I can, and I'll tell you how far I got so you can continue doing something else from that point". But it seems that it was primarily designed to format things for human consumption, and maybe parse things written by machines, but not to parse things written by humans, and it shows up in a bunch of little ways. For example, it "supports" scientific notation like "1.2e1", but if you use it, it will insist that the number must be in scientific notation and fail the parse if you enter "12" instead. One could try working around this by checking the spot where it failed and parsing just the stuff before that as a number, but this is error-prone and even more annoying than just writing a regex for floats.
Meanwhile in C, this would be simply sscanf("%f"), and C++ you can use a string stream to do basically the same thing. Is there really no equivalent in Java?