I want to parse words into numbers and have an error when the string doesn't fully express a real number, for example:
"Twenty two" => 22
"One hundred forty four" => 144
"Twenty bla bla" => error
"One hundred forty thousand one" => error
I tried to use com.ibm.icu.text.RuleBasedNumberFormat
but the parse()
method is parsing only the beginning and not the full string.
This is mentioned in their javadoc:
Parses text from the beginning of the given string to produce a number. The method might not use the entire text of the given string
In their javadoc it is mentioned that a special rule set can be used, in combination with RuleBasedCollator
for changing the lenient parsing, but I'm struggling to achieve this.
public class NumFormatter {
public static int numberFromString(String number, Locale locale) {
RuleBasedNumberFormat numberFormat = new RuleBasedNumberFormat(locale, RuleBasedNumberFormat.SPELLOUT);
try {
return numberFormat.parse(number).intValue();
} catch (ParseException e) {
return -1;
}
}
}
public class NumFormatterTest
@Test
public void formatNumber_fromString() {
Locale locale = new Locale("en");
assertEquals(numberFromString("twenty two", locale), 22);
assertEquals(numberFromString("three blablabla ", locale), -1); // not ok. It return 3 and not -1.
}
}
pom.xml
<dependency>
<groupId>com.ibm.icu</groupId>
<artifactId>icu4j</artifactId>
<version>60.2</version>
</dependency>
Did anyone had to deal with this before? Thank you in advance.