0

I need to extract numeric values from strings like "£17,000 - £35,000 dependent on experience"

([0-9]+k?[.,]?[0-9]+)

That string is just an example, i can have 17k 17.000 17 17,000, in every string there can be 0,1 or 2 numbers (not more than 2), they can be everywhere in the string, separated by anything else. I just need to extract them, put the first extracted in a place and the second in another.

I could come up with this, but it gives me two matches (don't mind the k?[,.], it's correct), in the $1 grouping. I need to have 17,000 in $1 and 35,000 in $2, how can i accomplish this? I can also manage to use 2 different regex

Samuele Mattiuzzo
  • 10,760
  • 5
  • 39
  • 63

3 Answers3

1

Using regex

With every opening round bracket you create a new capturing group. So to have a second capturing group $2, you need to match the second number with another part of your regex that is within brackets and of course you need to match the part between the to numbers.

([0-9]+k?[.,]?[0-9]+)\s*-\s*.*?([0-9]+k?[.,]?[0-9]+)

See here on Regexr

But could be that Solr has regex functions that put all matches into an array, that would maybe be easier to use.

stema
  • 90,351
  • 20
  • 107
  • 135
  • this is actually the best matcher, except i'm not always having a - between the numbers,but any character there. – Samuele Mattiuzzo Jun 28 '11 at 14:47
  • @Samuele Mattiuzzo Then try just to delete `\s*-\s*` the `.*?` part will then cover every character till the next match is found. – stema Jun 28 '11 at 14:49
0

Match the entire dollar range with 2 capture groups rather than matching every dollar amount with one capture group:

([0-9]+k?[.,]?[0-9]+) - ([0-9]+k?[.,]?[0-9]+)

However, I'm worried (yeah, I'm minding it :p) about that regex as it will match some strange things:

182k,938 - 29.233333

will both be matched, it can definitely be improved if you can give more information on your input types.

NorthGuard
  • 953
  • 1
  • 7
  • 21
0

What about something along the lines of

[£]?([0-9]+k?[.,]?[0-9]+) - [£]([0-9]+k?[.,]?[0-9]+)

This should now give you two groups.

Edit: Might need to clean up the spaces too