7

I have the following text

text = 'This is "a simple" test'

And I need to split it in two ways, first by quotes and then by spaces, resulting in:

res = ['This', 'is', '"a simple"', 'test']

But with str.split() I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
wasp256
  • 5,943
  • 12
  • 72
  • 119

6 Answers6

13

You can use shlex.split, handy for parsing quoted strings:

>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']

Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix is set to True by default:

>>> shlex.split(text)
['This', 'is', 'a simple', 'test']

If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader:

import io
import csv

s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print(list(f))
# [['This', 'is', 'a simple', 'test']]

If on Python 3, you won't need to decode the string to unicode as all strings are already unicode.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
1

For your case shlex.split will just do fine.

As answer to multiple delimiters?

import re

re.split('\"|\s', string)
Rahul
  • 10,830
  • 4
  • 53
  • 88
1

If I understand you right, then you can use regex

>>> import re
>>> text = 'This is "a simple" test'

>>> re.split('\s|\"', text)

['This', 'is', '', 'a', 'simple', '', 'test']

Samat Sadvakasov
  • 639
  • 5
  • 10
0

using csv reader.

import csv 
text = 'This is "a simple" test'
list_text=[]
list_text.append(text)
for row in csv.reader(list_text, delimiter=" "):
    print(row)

you can also see more about here

R.A.Munna
  • 1,699
  • 1
  • 15
  • 29
0

try using re:

import re
text = 'This is "a simple" test'
print(re.split('\"|\s', text))

The result:

['This', 'is', '', 'a', 'simple', '', 'test']
Marcos Rusiñol
  • 105
  • 1
  • 8
0

You can look into shlex library.

from shlex import split
a = 'This is "a simple" text'
split(a)

['This', 'is', 'a simple', 'text']

I don't think regex is what you are looking for

Rishabh Rusia
  • 173
  • 2
  • 4
  • 19