-1

I have a list input:

['ICE ERIE', 'ERIE', 'o9 ManGo', 'ManGo SLACKCURRAN 120mL', 'SLACKCURRAN']

How can I extract the following string from it:

'ManGo SLACKCURRAN 120mL'

Another example:

Input:

['SWANSON', 'Apple Cider Vinegar Food Supplement Supplement mg per tablet DOUBLE STRENGTH FORMULA per tablet 1 NET', 'Cider', 'Vinegar', 'Food Supplement DOUBLE', 'Supplement', '200', 'per', 'tablet', 'DOUBLE', 'TABLETS 1 NET WEIGHT: 62g', '1', 'NET', 'WEIGHT:']

Output:

'TABLETS 1 NET WEIGHT: 62g' 

My attempt:

import re
l = []
for each in input:   
    elif re.match('^\\d+\\.?\\d*(ounce|fl oz|foot|sq ft|pound|gram|inch|sq in|mL)$',each.lower()):
        l.append(each)
    else:
        pass 
ssr
  • 69
  • 6

1 Answers1

1

You can use

import re
input_l = ['ICE ERIE', 'ERIE', 'o9 ManGo', 'ManGo SLACKCURRAN 120mL', 'SLACKCURRAN']
reg = re.compile(r'\d*\.?\d+\s*(?:ounce|fl oz|foot|sq ft|pound|gram|inch|sq in|ml)\b', re.I)
print( list(filter(reg.search, input_l)) )
# => ['ManGo SLACKCURRAN 120mL']

See the Python demo.

Notes:

  • Use re.search to search for matches anywhere inside the string (re.match only searches at the string start), see this thread
  • Remove ^ (start of string) and $ (end of string) anchors
  • Use a re.I flag for case insensitive matching
  • \d*\.?\d+ is a more convenient pattern to match either integer or float numbers as it also supports .95 like numbers
  • End the pattern with a word boundary to match units of measurement as whole words (mind the r prefix before the string literal).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563