0

I have the following regex which is trying to parse price out of a string:

  $pattern = '#([Ii][Dd][Rr].?\s*[0-9.,]+)|
                ([Rr][Pp].?\s*[0-9.,]+)|
                ([Pp][Rr][Ii][Cc][Ee]:?\s*[0-9.,]+)|
                (\s[0-9]+\s?[Kk]\s)|
                ([0-9]+[Rr][Bb])|
                ([0-9.,]+\s*[Rr][Ii][Bb][Uu])|
                (\b[0-9]+[.,][0-9]+[.,]?[0-9]+)#u';
 $matches = array();
 preg_match($pattern, $caption, $matches);

When tested with the following string:

"ABBY TOP
 Colour : POLKA BLACK
 Weight : 0,18
 Price : 185,000
 Material : Kaos Semi-Fleece
 Size : Panjang / Length: 55 cm (depan), 72 (belakang)"

This always parses the 0.18 as the price, while I wanted the Price: 185,000 to be the real price. Is there anything wrong in my regex?

adit
  • 32,574
  • 72
  • 229
  • 373
  • What exactly are you trying to match? The last optional match (`(\b[0-9]+[.,][0-9]+[.,]?[0-9]+)`) is going to match any number, even if it hits it before it gets to the word "price" or some other match. Do you really need that group? – elixenide Dec 15 '14 at 05:26
  • Don't use regex for this. Just split the string on newlines and then again on `:` to produce key-value pairs. – Evan Davis Dec 15 '14 at 05:27
  • I think I want to make it tighter such that it matches 3 digit (thousands) or 3 digit (millions), so 2.00 should be invalid, but 2.000 is valid .. I tried enforcing this by changing it to (\b[0-9]+[.,][0-9]{3}[.,]?[0-9]{3})#u'; but this doesn't work – adit Dec 15 '14 at 05:40

1 Answers1

1

No offense, but... Before I give you the answer, let me point to you the many fixes to apply in your regex.

While you're attempting a dual case match, [Ii][Dd][Rr] isn't a good idea: use idr as usual, but include the case-insensitive flag: #i

Using \d over [0-9] makes the world happier.

Also, your Price entry is Price : 185,000 but the subpattern ([Pp][Rr][Ii][Cc][Ee]:?\s*[0-9.,]+) won't capture it because of the space before the colon. Add \s*.

See also:

Now back to placing precedence into account. You can use the same technique from this other answer of mine, which makes your regex:

/^.*?\Kidr.?\s*[\d.,]+|
.*?\Krp.?\s*[\d.,]+|
.*?\Kprice:?\s*[\d.,]+|
.*?\K\s\d+\s?k\s|
.*?\K\d+rb|
.*?\K[0-9.,]+\s*ribu|
.*?\K\b\d+[.,]\d+[.,]?\d+/xis
Regex101 Demo
Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72