0

How would you parse the ['i386', 'x86_64'] out of a string like '-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'?

>>> my_arch_parse_function('-foo 23 -bar -arch i386 -arch x86_64 -isysroot /  -fno-strict-aliasing -fPIC')
>>> ['i386', 'x86_64']

Can this be done using regex, or only using modules like PyParsing, or manually splitting and iterating over the splits?

Assumption: -arch VAL are grouped together.

Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187

6 Answers6

4

Why not use the argument parsing modules? optparse in Python 2.6 (and 3.1) and argparse in Python 2.7 (and 3.2).

EDIT: On second thought, that's not as simple as it sounds, because you may have to define all the arguments you are likely to see (not sure if these modules have a catchall mechanism). I'll leave the answer here because might work, but take it with a grain of salt.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
3

Regex: (?<=-arch )[^ ]+

>>> re.findall( r"(?<=-arch )([^ ]+)", r"'-foo 23 -bar -arch ppc -arch i386 -isysroot -fno-strict-aliasing -fPIC'" )
['ppc', 'i386']

Arbitrary whitespace

>>> foo = re.compile( r"(?<=-arch)\s+[^\s]+" )
>>> [ str.strip() for str in re.findall( foo, r"'-foo 23 -bar -arch ppc -arch i386 -isysroot -fno-strict-aliasing -fPIC'" ) ]
['ppc', 'i386']

P.S. There's no x86_64 in that string, and are you trying to differentiate between -arch ppc and -arch i386?

Katriel
  • 120,462
  • 19
  • 136
  • 170
  • I noticed that this regex does not handle extra whitespace after `-arch`. `re.findall( r"(-arch\s+)([^ ]+)", [...]` works, but it returns a list of tuples. – Sridhar Ratnakumar Jul 28 '10 at 18:27
  • 1
    If you don't know how much whitespace might be after `-arch`, you'll have to strip it from the string after matching -- lookbehinds must be fixed-width. See above. – Katriel Jul 28 '10 at 18:31
  • Final verdict: `r"(?<=-arch)\s+([^\s]+)"` which will not even require manual stripping. – Sridhar Ratnakumar Jul 28 '10 at 18:37
  • That doesn't work. Specifically, try it on `"-arch i386"` (two spaces) -- it'll capture the whitespace minus one space into the first group, but the regex still matches the whole expression. Also, it requires that the first character after `-args` is a space, what if it's a tab? – Katriel Jul 28 '10 at 18:41
  • @katrielalex: what doesn't work? the 'arbitrary whitespace' example in your answer? – Sridhar Ratnakumar Jul 30 '10 at 17:08
  • Hmm, I'm not sure what I was thinking. The point I was trying to make was that the regex you posted above doesn't work, but it does! I think you might have edited it before I posted the message =p? – Katriel Jul 30 '10 at 20:31
2

Would you consider a non-regex solution? Simpler:

>>> def my_arch_parse_function(s):
...     args = s.split()
...     idxs = (i+1 for i,v in enumerate(args) if v == '-arch')
...     return [args[i] for i in idxs]
...     
... 
>>> s='-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'
>>> my_arch_parse_function(s)
['ppc', 'i386']
Muhammad Alkarouri
  • 23,884
  • 19
  • 66
  • 101
0

Answering my own question, I found a regex via this tool:

>>> regex = re.compile("(?P<key>\-arch\s?)(?P<value>[^\s]+?)\s|$")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x8aa59232ae397b10>
>>> regex.match(string)
None

# List the groups found
>>> r.groups()
(u'-arch ', u'ppc')

# List the named dictionary objects found
>>> r.groupdict()
{u'key': u'-arch ', u'value': u'ppc'}

# Run findall
>>> regex.findall(string)
[(u'-arch ', u'ppc'), (u'-arch ', u'i386'), (u'', u'')]
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
0

Try this if you want regex:

arch_regex = re.compile('\s+('+'|'.join(arch_list)+')\s+',re.I)
results = arch_regex.findall(arg_string)

A little too much regex for my taste, but it works. For future reference, it is better to use optparse for command line option parsing.

krs1
  • 1,125
  • 7
  • 16
  • Ugh. And if you know all the arguments that might arrive, you should use argparse in the first place! – Katriel Jul 28 '10 at 18:42
0

Hand-made with Python2.6 I am sure that you or a library can do a better job.

inp = '-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'.split()
dct = {}
noneSet = set([None])

flagName = None
values = []
for param in inp:
    if param.startswith('-'):
        flagName = param
        if flagName not in dct:
            dct[flagName] = set()
        dct[flagName].add(None)
        continue
    # Else found a value
    dct[flagName].add(param)

print(dct)

result = sorted(dct['-arch'] - noneSet)
print(result)

>>> ================================ RESTART ================================
>>> 
{'-arch': set(['ppc', 'i386', None]), '-isysroot': set([None, '/']), '-fno-strict-aliasing': set([None]), '-fPIC': set([None]), '-foo': set([None, '23']), '-bar': set([None])}
['i386', 'ppc']
>>> 
Hamish Grubijan
  • 10,562
  • 23
  • 99
  • 147