0

I am using python to parse a string that is passed in by the optparse module. I want to split the string on certain delimiters but not in between quote marks. A sample string is:

--state-basedir /dir/dir/dir/ --cmd=\"param load $v2param\" --master=/dev/ttyUSB0 --console --map --out=udp:192.168.1.1:14550

This string is passed in as a single optparse argument, I am then going to pass it to another process. I have been trying various things at http://pythex.org/ The closest I have gotten is:

`(?<!")[\s=](?![\s0-9a-zA-Z\$\\]*")`

The issue is that the = sign after --cmd and the space before --master are not matched.

In plain English, this is how I am reading my regex:

match either a space character or an equal character as long as it is not preceded by a quotation mark and as long as it is not proceeded by a combination of any other letter,numbers,punctuation and another quotation mark

I had a feeling that there was something else I was missing, like greediness, so I tried adding ? after my look-ahead and look-behind terms. If I put a ? after my look-behind one I can get the space before --master but if I put the ? after my look-ahead term I get the spaces in the quotation marks now, which I don't want.

The idea here is that I am going to use re.split to handle things.

Thanks for any explanations as to what I am doing wrong.

Jesse
  • 901
  • 1
  • 9
  • 25
  • See [Why use argparse rather than optparse?](http://stackoverflow.com/questions/3217673/why-use-argparse-rather-than-optparse) – Wiktor Stribiżew Nov 04 '16 at 19:48
  • @WiktorStribiżew thanks for that, but I don't think my issue is necessarily a limitation of optparse. I am calling a different python script with my example as a single string. This single string is what I have to parse to pass on to another process. I am trying to fix a small issue I am having with the mavproxy.py script for use with drones – Jesse Nov 04 '16 at 20:16
  • Well, I see, the point I am trying to make is that parsing command line arguments with regex is a real pain and since there are better options, you'd better stick with them. – Wiktor Stribiżew Nov 04 '16 at 20:19
  • @WiktorStribiżew, ahh my misunderstanding – Jesse Nov 04 '16 at 20:20

1 Answers1

0

This is not a regex answer and it's also not pretty, but it is one line.

 sum([[x] if '"' in x else re.split(' |=',x) for x in re.split('=(\".+?\" )',a)],[])

output:

['--state-basedir', '/dir/dir/dir/', '--cmd', '"param load $v2param" ', '--master', '/dev/ttyUSB0', '--console', '--map', '--out', 'udp:192.168.1.1:14550']

Starting from the re.split('=(\".+?\" )',a)] this splits out text surrounded by quotes (more specifically ="something another thing"). The split pieces are then split further with re.split(' |=',x) if they do not have a " in them, or are just returned as is [x] if they do. The last step is collapsing the resulting 2d list by overloading sum with sum(two_d_list,[]).

I hope this answer helps but I understand if it isn't what you're looking for

mitoRibo
  • 4,468
  • 1
  • 13
  • 22