3

I have my data as below

string = ' streptococcus 7120 "File  being  analysed" rd873 '

I tried to split the line using n=string.split() which gives the below result:

[streptococcus,7120,File,being,analysed,rd873]

I would like to split the string ignoring white spaces in " "

# output expected :

[streptococcus,7120,File being analysed,rd873]
  • Is it possible that you will have nested quotes (e.g. `"File name "foo" being analyzed"`)? – senshin Feb 07 '14 at 16:28
  • Also see [this](http://stackoverflow.com/questions/2785755/how-to-split-but-ignore-separators-in-quoted-strings-in-python). – devnull Feb 07 '14 at 16:38

2 Answers2

4

Use re.findall with a suitable regex. I'm not sure what your error cases look like (what if there are an odd number of quotes?), but:

filter(None, it.chain(*re.findall(r'"([^"]*?)"|(\S+)', ' streptococcus 7120 "File  being  analysed" rd873 "hello!" hi')))
> ['streptococcus',
   '7120',
   'File  being  analysed',
   'rd873',
   'hello!',
   'hi']

looks right.

U2EF1
  • 12,907
  • 3
  • 35
  • 37
  • Did you test this on OP's sample string? It doesn't do what you want it to do. (Well, now it does after you edited it.) – senshin Feb 07 '14 at 16:30
3

You want shlex.split, which gives you the behavior you want with the quotes.

import shlex

string = ' streptococcus 7120 "File  being  analysed" rd873 '
items  = shlex.split(string)

This won't strip extra spaces embedded in the strings, but you can do that with a list comprehension:

items  = [" ".join(x.split()) for x in shlex.split(string)]

Look, ma, no regex!

kindall
  • 178,883
  • 35
  • 278
  • 309