1

Lets say I have the following string,

ing = "2 cup butter, softened"

and I only want butter from the string (I have done the following so far),

ing.replace('2','').replace('cup','').replace(', ','').replace('softened','')
ing.strip()

EDIT

    Traceback (most recent call last):
  File "parsley.py", line 107, in <module>
    leaf.write_ingredients_to_csv()
  File "parsley.py", line 91, in write_ingredients_to_csv
    out = re.sub(words, '', matched)
  File "C:\Users\Nikhil\Anaconda3\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Users\Nikhil\Anaconda3\lib\re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\Nikhil\Anaconda3\lib\sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\Nikhil\Anaconda3\lib\sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\Nikhil\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "C:\Users\Nikhil\Anaconda3\lib\sre_parse.py", line 752, in _parse
    len(char) + 1)
sre_constants.error: unknown extension ?| at position 23

Is there a more efficient way of doing this in Python 3? What I have shown is just one example of the string that I am processing. There are many more strings with different words that I need to remove such as cups,cup,tablespoons and teaspoon. I am using the same approach to eliminate the words from the string, so is there a better way of doing this?

Nikhil Raghavendra
  • 1,570
  • 5
  • 18
  • 25

1 Answers1

2

You may want to use regular expressions.

import re

words = r'oz|lbs?|cups?|tablespoons?|tea‌​spoons?|softened'
words_rm = r'slices?|shredded|sheets?|cans?|\d ?g\b'
other = r'[\d,;#\(\)\[\]\.]'
ing = "2 cup butter, softened"
out = re.sub(words, '', ing)
out = re.sub(words_rm, '', out)
out = re.sub(other, '', out)
out.strip()
# returns:
'butter'
James
  • 32,991
  • 4
  • 47
  • 70
  • I get this error, `unknown extension ?| at position 23`. How do I solve it? – Nikhil Raghavendra Jan 20 '18 at 05:18
  • I would need the full traceback – James Jan 20 '18 at 05:20
  • What are you using for `words`? – James Jan 20 '18 at 05:27
  • I am using, `words = r'oz?|lbs?|cups?|lb?|.?|(?|cup?|tablespoons?|tablespoon?|teaspoons?|teaspoon?|)'` and `words_rm = r'slices?|slice?|shredded?|sheets?|sheet?|g ?|cans ?|can'` – Nikhil Raghavendra Jan 20 '18 at 05:30
  • The period `'.'` is a special character in regex. The way you have it written is an invalid expression. The question mark indicates that the preceeding character is optional. IE: `'cups?'` will match both `'cup'` and `'cups'`. Please see my updated answer. – James Jan 20 '18 at 05:36