3

I would like to split a string into 3 elements by spaces but I don't want the quoted substrings to be split (they can also contain backslash to escape the quotes).

For instance:

"command argument other arguments and options"
>> ['command', 'argument', 'other arguments and options']

'command "my argument" other arguments and options'
>> ['command', 'my argument', 'other arguments and options']

'command "my \"ugly\" argument" other "arguments" and options'
>> ['command', 'my "ugly" argument', 'other "arguments" and options']

I had a look at this similar question but shlex.split() will also split the end of the string (and it will remove the quotes and the spaces) whereas I want to keep the third element intact.

I tried to use shlex.split(mystring)[0:2] in order to get the first two elements but then I can't manage to find a good solution to extract the third element from the original string. Actually I wish I could use shlex.split() like the str.split() method with a maxsplit argument.

Is there a better way to do this than using shlex.split()? Perhaps regexes? Thanks!

Community
  • 1
  • 1
Nicolas
  • 5,583
  • 1
  • 25
  • 37

2 Answers2

5

You should be able to hack a solution by accessing the parser state of a shlex object:

>>> import shlex
>>> s = shlex.shlex("command 'my \'ugly\' argument' other \"arguments\" and options", posix=True)
>>> s.whitespace_split = True
>>> s.commenters = ''
>>> next(s)
'command'
>>> next(s)
'my ugly argument'
>>> s.instream.read()
'other "arguments" and options'

See shlex.py module source.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
1

Why not re-join the remaining arguments, after splitting it with shlex?

command = command[:2] + [' '.join(command[2:])]

Alternatively, you'd have to drive the shlex.shlex() instance yourself:

>>> import shlex
>>> input = "command 'my \'ugly\' argument' other \"arguments\" and options"
>>> lex = shlex.shlex(input, posix=True)
>>> lex.whitespace_split=True
>>> lex.commenters = ''
>>> command = [lex.next(), lex.next(), lex.instream.read()]
>>> command
['command', 'my ugly argument', 'other "arguments" and options']

The .instream attribute is the file-like object holding the text being parsed, and will thus contain the remainder after parsing the first two arguments.

It is possible that you need to access the pushback state though, where the lexer stores tokens it took a look at but were not needed for the current token:

>>> command = [lex.next(), lex.next(), ''.join(list(lex.pushback)) + lex.instream.read()]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    I tried your first solution, but `shlex.split()` will also alter the remaining arguments/options by removing the quotes or spaces for instance. Edit: the second works as expected, thanks! – Nicolas Nov 06 '12 at 10:37