9

How do I split a string with Python's shlex while preserving the quote characters that shlex splits on?

Sample Input:

Two Words
"A Multi-line
 comment."

Desired Output:

['Two', 'Words', '"A Multi-line\ncomment."']

Note the double quotes wrapping the multi-line string. I read through the shlex documentation, but I don't see an obvious option. Does this require a regular expression solution?

Petrus Theron
  • 27,855
  • 36
  • 153
  • 287
  • 1
    This smells strongly of an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). What are you actually trying to do? What is the output for? What makes you think `shlex` is the right answer? Are you really looking for something that works just like a POSIX shell's argument parsing, except for this one way in which it works completely differently? – abarnert Dec 23 '13 at 23:40

2 Answers2

8
>>> print(s)
Two Words
"A Multi-line
 comment."
>>> shlex.split(s)
['Two', 'Words', 'A Multi-line\n comment.']
>>> shlex.split(s, posix=False)
['Two', 'Words', '"A Multi-line\n comment."']
>>> 

Changed in version 2.6: Added the posix parameter.

kxr
  • 4,841
  • 1
  • 49
  • 32
  • shlex.split(s, posix=False) does not keep the quote as a single element. It splits it like the other words. – John Glen Apr 03 '22 at 14:11
3

I'm not sure why you're trying to use shlex for this. The whole point is to split into the same arguments the shell would. As far as the shell is concerned, those quotes are not part of the argument. So, this is probably the wrong thing to do…

But if you want to do it, you can access the lower levels of the shlex parser, which makes this trivial. For example:

>>> data = '''Two Words
"A Multi-line
 comment."'''
>>> sh = shlex.shlex(data)
>>> sh.get_token()
'Two'
>>> sh.get_token()
'Words'
>>> sh.get_token()
'"A Multi-line\n comment."'

    >>> sh.get_token()     ''

So, if you want to get this as a list, you can do this one-liner:

>>> list(iter(shlex.shlex(data).get_token, ''))

I believe this requires Python 2.3+, but since you linked to the docs from 3.4 I doubt that's a problem. Anyway, I verified that it works in both 2.7 and 3.3.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • This solution doesn't really work in general. For example, if `data="-1.0"`, you get `['-', '1', '.', '0']`. As for why someone might want this, shlex is very useful beyond just managing shell arguments in string and data parsing, and is probably the closest thing existing to what the OP wants. – dvntehn00bz Feb 20 '14 at 19:22
  • Just worked a couple of tests myself: If you create a shlex.shlex object beforehand, you can change `whitespace_split = True` to only split whitespace. – dvntehn00bz Feb 20 '14 at 19:30
  • Upvoting because of the first sentence - "I'm not sure why you're trying to use shlex for this.". I realized that I can use string split(). – Ali Feb 27 '20 at 05:52