-2

I'm trying to identify numbers and corresponding magnitude in a text. I run into the following error:

UNABLE TO PARSE MAGNITUDE: 6,700

Here's a code snippet from a larger code to help you understand what I'm doing.

for(Quantity quantity: originalQuantities){
    y = Math.round(quantity.getMagnitude());

    if (( roleStrings.get(SemanticRole.TIME) != null && (roleStrings.get(SemanticRole.TIME)).contains(String.valueOf(y))))
        continue;
.........................

Quantity here is a class with the following definition:

public class Quantity
{
    private Float       magnitude;
    private String      multiplier;
    private String      unit;
    private UnitType    type;
    private Float       absoluteMagnitude;

enum UnitType
{
    TIME, MONEY, WEIGHT, VOLUME, NUMBER
}
public Quantity(String strMagnitude, String multiplier, String unit,
            String strType)
    {
        this.setMagnitude(strMagnitude);
        this.multiplier = multiplier;
        this.unit = unit;
        this.setType(strType);
    }

    public Float getMagnitude()
    {
        return magnitude;
    }

    public String getMultiplier()
    {
        return multiplier;
    }

    public String getUnit()
    {
        return unit;
    }

    public UnitType getType()
    {
        return type;
    }

How do I solve this? I tried using Locale and ParseFloat and other transformations but couldn't fix the issue.

Here is the code which parses magnitude:

    public static List<Quantity> getQuantitiesFromString(String str) throws ParseException
{
    List<Quantity> quantities = new ArrayList<Quantity>();
    //final String REGEX = "^(\\+|-)?([1-9]\\d{0,2}|0)?(,\\d{3}){0,}(\\.\\d+)?";
    //NumberFormat numberFormat = NumberFormat.getNumberInstance(Locale.US);
    //String numberAsString = numberFormat.format(number);
    // optional +/- sign followed by numbers separated with a decimal

    Pattern pattern = Pattern.compile("^[-+]?[0-9]*\\.?[0-9]+");
    Pattern pattern1 = Pattern.compile("^[0-9][0-9,-]*-[0-9,-]*[0-9]");



    List<String> tokens = Arrays.asList(str.split(" "));

    for (int i = 0; i < tokens.size(); i++)
    {
        String magnitude = "";
        String multiplier = "";
        String unit = "";
        String type = "";

        boolean numFound = false;

        String token = tokens.get(i);

        // append all numbers matching pattern into a String
        Matcher matcher = pattern.matcher(token);
        Matcher matcher1 = pattern1.matcher(token);

        while (matcher.find())
        {
            numFound = true;
            magnitude += matcher.group();
        }

        //ignore for number ranges (e.g. 0-10)
        while (matcher1.find())
        {
            numFound = false;
            continue;
        }

        if (numFound)
        {
            // loop through all words starting from current word
            // keep adding valid unit words until an invalid unit word is
            // encountered
            for (int j = i; j < tokens.size(); j++)
            {
                // strip non-alphabetic chars from word
                String word = tokens.get(j).replaceAll("[^a-zA-Z$%]", "")
                        .toLowerCase();

                // see if the stripped word is a unit
                boolean validUnitWord = false;
                if (getUnitTypesMap().keySet().contains(word))
                {
                    validUnitWord = true;

                    if (getUnitTypesMap().get(word).equalsIgnoreCase(
                            "number"))
                    {
                        multiplier += multiplier.isEmpty() ? word : " "
                                + word;
                    }
                    else
                    {
                        unit += unit.isEmpty() ? word : " " + word;
                        type = getUnitTypesMap().get(word);
                    }
                }

                // break if invalid unit word; else keep searching in next
                // words

                // except for current word (index = i), in which case keep
                // searching regardless
                if (!validUnitWord && j != i)
                    break;
            }

            quantities.add(new Quantity(magnitude, multiplier, unit, type));
        }
    }

    return quantities;
}

EDIT

The Unable to parse magnitude error was when I was playing around with Locale.US

I reverted to older code and now for a string like:

debentures amounting to Rs 6,700 crore

the output I get from the getQuantitiesFromString is:

QUANTITY: [[magnitude=6.0, multiplier=crore, unit=, type=NUMBER, absoluteMagnitude=null]]

Everything after the comma is being ignored. I tried this regex to detect numbers like 22,00.15 22,000,353 etc.:

"^(\+|-)?([1-9]\d{0,2}|0)?(,\d{3}){0,}(\.\d+)?"

But for some reason it doesn't work for my code.

serendipity
  • 852
  • 13
  • 32
  • 3
    Where's the code that parses anything? – f1sh Dec 05 '16 at 14:31
  • 2
    Did you enter a correct locale? notation of time, dates, money, weight differ per locale. – Tschallacka Dec 05 '16 at 14:31
  • other possibility to solve parsing is replacing the `,` with a `.` then it should be possible to parse – XtremeBaumer Dec 05 '16 at 14:44
  • In your locale, does 6,700 signify 6700 or 6.7? – Ole V.V. Dec 05 '16 at 15:10
  • You may want to look into `NumberFormat` and/or `DecimalFormat` for parsing locale-specific numbers. That’s what those classes are great for. – Ole V.V. Dec 05 '16 at 15:11
  • Possible duplicate of [How to parse “1,234.56” in Java as Double?](http://stackoverflow.com/questions/30623479/how-to-parse-1-234-56-in-java-as-double). – Ole V.V. Dec 05 '16 at 15:13
  • If I use this code as mentioned by you guys: NumberFormat format =NumberFormat.getInstance(Locale.ENGLISH); Number number = format.parse(quantity.getMagnitude()); y = Math.round(number.floatValue()); I get the following error: The method parse(String) in the type NumberFormat is not applicable for the arguments (Float) – serendipity Dec 06 '16 at 07:12
  • Parsing is for converting a String into some other representation, e.g., a float. When `quantity.getMagnitude()` is already a float (which it seems to be, I still wish you would tell us), there is no point in parsing. This gets us back to @f1sh‘s question: where is the code that tries to parse magnitude 6,700? – Ole V.V. Dec 06 '16 at 09:53
  • @OleV.V. Just added the code that parses magnitude. Sorry for not posting it earlier. – serendipity Dec 06 '16 at 09:57
  • Thanks for adding the `getQuantitiesFromString()` code. First, I don’t see how the message “UNABLE TO PARSE MAGNITUDE: 6,700” or any `ParseException` can come out of it. Second, I cannot understand what it’s supposed to do, so you may want to provide a sample `str` argument and explain the desired outcome of it? – Ole V.V. Dec 06 '16 at 10:13
  • Does you `Quantity` class have a constructor `Quantity(String, String, String, String)`? If so, will you include it in the question? – Ole V.V. Dec 06 '16 at 10:36
  • @OleV.V. I've added more code to describe the constructor for Quantity and explained what getQuantitiesFromString does with a str example. Hope this helps. I think the solution to my problem for now lies in the regex I am using. This code is part of a much larger code where I am also detecting Named Entity tags using the Stanford NLP parser so could detect MONEY tag using that but that calls for a bigger code change. – serendipity Dec 06 '16 at 11:09
  • @OleV.V. Thank you for being so patient and helpful. Very grateful! – serendipity Dec 06 '16 at 11:09

1 Answers1

0

Because of the ^ your pattern "^[-+]?[0-9]*\\.?[0-9]+" only looks in the beginning of the string 6,700. So it finds the 6 and does not find the 700. If you remove that ^, your method will pass 6700 to your contructor.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161