40

I want to extract information from user-inputted text. Imagine I input the following:

SetVariables "a" "b" "c"

How would I extract information between the first set of quotations? Then the second? Then the third?

NG_
  • 6,895
  • 7
  • 45
  • 67
Reznor
  • 1,235
  • 5
  • 14
  • 23

3 Answers3

65
>>> import re
>>> re.findall('"([^"]*)"', 'SetVariables "a" "b" "c" ')
['a', 'b', 'c']
jspcal
  • 50,847
  • 7
  • 72
  • 76
  • 2
    Is the semi colon at the end of the line needed? – User Mar 14 '14 at 18:16
  • @User good catch four years after no one else noticed. It should be noted that semi-colons (`;`) allow you to put two commands on one line but this practice is discouraged for the most part. – WinEunuuchs2Unix Feb 21 '22 at 21:32
44

You could do a string.split() on it. If the string is formatted properly with the quotation marks (i.e. even number of quotation marks), every odd value in the list will contain an element that is between quotation marks.

>>> s = 'SetVariables "a" "b" "c"';
>>> l = s.split('"')[1::2]; # the [1::2] is a slicing which extracts odd values
>>> print l;
['a', 'b', 'c']
>>> print l[2]; # to show you how to extract individual items from output
c

This is also a faster approach than regular expressions. With the timeit module, the speed of this code is around 4 times faster:

% python timeit.py -s 'import re' 're.findall("\"([^\"]*)\"", "SetVariables \"a\" \"b\" \"c\" ")'
1000000 loops, best of 3: 2.37 usec per loop

% python timeit.py '"SetVariables \"a\" \"b\" \"c\"".split("\"")[1::2];'
1000000 loops, best of 3: 0.569 usec per loop
Roman
  • 3,050
  • 3
  • 21
  • 20
15

Regular expressions are good at this:

import re
quoted = re.compile('"[^"]*"')
for value in quoted.findall(userInputtedText):
    print value
Sumit Singh
  • 15,743
  • 6
  • 59
  • 89
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395