0

I have: "The price is 1 000$ another pice 34 000 , 00 EUR. You have to pay 1400 EUR, and you have to pay extra 2000$". What i want? I want price, but if before price is word "pay" or "pay extra" then i must reject this price. I have regex that give me price, so it is great, but i think that i need another? or modify regex that reject some price if before price is specific word. Output of my example should be: 1000,34000 My code:

String regex = "(([0-9]+[\\s,.]*)+)(\\$|EUR)";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
            price = matcher.group();
if (price.contains(",")) {
            price = price.substring(0, price.indexOf(","));
        }
        price = price.replaceAll("\\s", "").replaceAll("[^0-9]+", "");
        if (price.contains(",")) {
            price = price.replaceAll("\\,", "");
        } else {
            price = price.replaceAll("\\.", "");
        }

It give me:

1000,34000,1400,2000

But I want only: 1000,34000 I must reject these prices that are after word "pay" and "pay extra". edit: "." is for price like this 1 000. 00

JavaCoder
  • 135
  • 2
  • 8
  • 1
    See http://ideone.com/0O3Dp0. What are `.` for in the pattern? – Wiktor Stribiżew Jul 31 '17 at 12:11
  • You can use something of the likes `^(price\s(extra\s)?)` – Jeremy Grand Jul 31 '17 at 12:11
  • i think the solution of @WiktorStribiżew can help you Alcwak ;) – Youcef LAIDANI Jul 31 '17 at 12:18
  • @WiktorStribiżew `.` is for price like 2 000 . 00. What i have to add to your regex when i want add more words that before i reject price? Now i can reject words: `"price", "extra"` but when i want more like: `"price","extra","more extra", "something else"`. What i have to add? – JavaCoder Jul 31 '17 at 12:31
  • Add more alternatives into Group 1. You may use it like `\\b(price|extra|more extra|something else)?\\b\\s*`. See [**this Java demo**](http://ideone.com/XFZdHT). – Wiktor Stribiżew Jul 31 '17 at 12:33

2 Answers2

2

I understand you have strings where decimal separator is a comma, and dots are digit grouping symbol.

You may match the pay or pay extra words as an optional capturing group (\\bpay(?:\\s+extra)?\\s*)? and check if the group matched. If it did, the match should be discarded, else, grab the number and remove , and the digits after it. Then, just remove all non-digit symbols.

See the Java demo:

String text = "The price is 1 000$ another pice 34 000 , 00 EUR. You have to pay 1400 EUR, and you have to pay extra 2000$";
String regex = "(\\bpay(?:\\s+extra)?\\s*)?(\\d[\\d\\s,.]*)(?:\\$|EUR)";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(text);
List<String> res = new ArrayList<>();
while (m.find()) {
    if (m.group(1) == null) {
        res.add(m.group(2).replaceAll(",\\s*\\d+|\\D", ""));
    }
}
System.out.println(res);
// => [1000, 34000]

Pattern details:

  • (\\bpay(?:\\s+extra)?\\s*)? - an optional capturing group matching a whole word pay or pay extra (with any 1+ whitespaces in between) and then 0+ whitespaces (when the group does not match, the matcher.group(1) is null)
  • (\\d[\\d\\s,.]*) - Group 2: a digit and then 0+ digits, whitespaces, , or/and . symbols
  • (?:\\$|EUR) - a non-capturing group matching either a $ symbol or EUR substring.

The ,\\s*\\d+|\\D pattern matches ,, 0+ whitespaces and 1+ digits or any non-digit symbol.

NOTE: If you can have both . and , as a decimal separator, in the last regex, replace , with [,.]. See this Java demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Can you write me a final example with words before: `"price","extra","more extra", "something else`. I try add this: `\\b(price|extra|more extra|something else)?\\b\\s*` but i don't know exacly where i must add it in regex and what i have to remove. – JavaCoder Jul 31 '17 at 12:48
  • 1
    See http://ideone.com/tld8m5. These alternations should be written as `\b(var1|var2|varN)\b\s*`. I just shorten `pay` and `pay extra` into a single branch by using a [*non-capturing* optional group](https://stackoverflow.com/questions/3512471) to make the pattern more efficient. You should follow the same to keep it robust. To make a non-capturing group optional just add `?` after it (so that it looks like `(?:...)?`) – Wiktor Stribiżew Jul 31 '17 at 12:51
1

I would sugegst the following method.

First I would get rid of whitespaces since they do not introduce any valuable information we need to take into account when parse.

Then I would substitute the decimal separator so that it is more common.

Now let me show in the code:

String parsePrices(String input){

    StringBuilder result = new StringBuilder();

    String preprocessedInput = input.
            replaceAll("\\s", "").
            replaceAll("(\\d)(\\,)(\\d)", "$1\\.$3");

    Pattern p = Pattern.compile("(?<!pay|payextra)((?<=[^\\d])\\d+\\.?\\d+)(\\$|EUR)");
    Matcher m = p.matcher(preprocessedInput);

    while(m.find()){
        result.append(String.format("%.0f", Double.parseDouble(m.group(1)))).append(",");
    }

    return result.toString().substring(0, result.length()-1);
}

Where:

  • first replaceAll() removes teh whitespaces
  • second replaceAll() changes decimal separator
  • regular expression uses negative-look-behind approach to exculde the proces which follow either pay or payextra
  • String.format("%.0f", Double.parseDouble(m.group(1))) allows you to tune how precise you would like your prices be.
Alexey R.
  • 8,057
  • 2
  • 11
  • 27