1

i use the following regex to extract values that appear before certain units:

([.\d]+)\s*(?:kg|gr|g)

What i want, is to include the unit of that specific value for example from this string :

"some text 5kg another text 3 g more text 11.5gr end"

i should be getting :

["5kg", "3 g", "11.5gr"]

can't wrap my head on how to modify the above expression to get the wanted result. Thank you.

oussama
  • 63
  • 6
  • Which group are you looking at? Group 0 or group 1? – Nick ODell Nov 16 '22 at 23:49
  • You already have the match, see https://regex101.com/r/1IPomV/1 But re.findall returns only the capture group values. See [re.findall behaves weird](https://stackoverflow.com/questions/31915018/re-findall-behaves-weird) – The fourth bird Nov 17 '22 at 00:02

2 Answers2

2
import re

p = re.compile('(?<!\d|\.)\d+(?:\.\d+)?\s*?(?:gr|kg|g)(?!\w)')
print(p.findall("some text 5kg another text 3 g more text 11.5gr end"))
Ricardo
  • 691
  • 3
  • 11
1

Other solution (regex demo):

(?i)\b\d+\.?\d*\s*(?:kg|gr?)\b
  • (?i) - case insensitive
  • \b - word boundary
    • \d+\.?\d* - match the amount
    • \s* - any number of spaces
    • (?:kg|gr?) - match kg, g or gr
  • \b - word boundary

import re

p = re.compile(r"(?i)\b\d+\.?\d*\s*(?:kg|gr?)\b")
print(p.findall("some text 5kg another text 3 g more text 11.5gr end"))

Prints:

['5kg', '3 g', '11.5gr']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91