https://regex101.com/ <- for those who want to test regex.
I'm working on Indonesian price parser.
Say, I have below examples:
1) 150 k
2) 150 kilobyte
3) 150 ka
4) 150 k2
5) 150 k)
6) 150 k.
We know 1), 5), 6) can be the price, while remains obviously cannot be.
My regex is bit complicated in real, but for simplicity,
Let's say my regex is: [0-9]+(\s*[k])
This catches 1) to 6), all of them.
So I put [^0-9a-zA-Z] to regex: [0-9]+(\s*[k])[^0-9a-zA-Z]
Now I got 1), 5), 6) only, and this is fine.
However, the problem is... they have unnecessary suffix like [ ) , ]
How can I parse just '150 k' without any suffix like [ ) , ] which is not related to price information?
Should I have one more process after get 5), 6) manually getting rid of those suffices?
Thank you in advance to any idea.