-2

I am new to python and looking for a elegant way to do the below job.

I have a string say:

s = u'(鞋子="AAA", last = "BBB", abcd)'

I am thinking of a function which can parse the above string and give output in the following format.

arg, kwarg = foo(s)

def foo():
    # the implementation I dont know.

How shall I perform this in python?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Jai Pandit
  • 510
  • 1
  • 6
  • 18
  • 1
    Why would you need to do this? There might be a better way to go about it, can you explain more? – Joff Aug 05 '16 at 23:57
  • This string is a user input. The user input should support chinese characters too. Now I want to parse the input and figure out what keyword arguments are supplied. – Jai Pandit Aug 05 '16 at 23:59
  • I am wondering if I can write a python function which takes kwargs as params and the argument can have a chinese character as key. Right now it says invalid syntax – Jai Pandit Aug 06 '16 at 00:00
  • 1
    What says invalid syntax? Your foo method here doesn't accept a parameter, so that's the first problem with the code you have shown – OneCricketeer Aug 06 '16 at 00:02
  • arg, kwarg = eval('dict' + s) – Jai Pandit Aug 06 '16 at 00:09
  • doing so gives a invalid syntax, So I am thinking of how can I parse the string to obtain the arg and kwargs. – Jai Pandit Aug 06 '16 at 00:09

2 Answers2

3

A nice way to parse a string that follows some grammar rules is the 3rd party pyparsing library. This is very generic lacking a formal grammar definition of allowed user input:

#coding:utf8
from pyparsing import *

# Names for symbols
_lparen = Suppress('(')
_rparen = Suppress(')')
_quote = Suppress('"')
_eq = Suppress('=')

# Parsing grammar definition
data = (_lparen +                        # left parenthesis
        delimitedList(                   # Zero or more comma-separated items
            Group(                       #   Group the contained unsuppressed tokens in a list
                Regex(u'[^=,)\s]+') +    #     Grab everything up to an equal, comma, endparen or whitespace as a token
                Optional(                #     Optionally...
                    _eq +                #       match an = 
                    _quote +             #       a quote
                    Regex(u'[^"]*') +    #       Grab everything up to another quote as a token
                    _quote)              #       a quote
                )                        #   EndGroup - will have one or two items.
            ) +                          # EndList
        _rparen)                         # right parenthesis

def process(s):
    items = data.parseString(s).asList()
    args = [i[0] for i in items if len(i) == 1]
    kwargs = {i[0]:i[1] for i in items if len(i) == 2}
    return args,kwargs

s = u'(鞋子="AAA", last = "BBB", abcd)'
args,kwargs = process(s)
for a in args:
    print a
for k,v in kwargs.items():
    print k,v

Output:

abcd
鞋子 AAA
last BBB
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Amazing script Mark. Thanks a lot you saved my day. However i tried to understand and modify it but failed to do it. As this scripts works very well for input = s = u'(鞋子="AAA", last = "BBB", abcd)' but throws parsing exception when input s = u'(鞋子=AAA, last = "BBB", abcd)'. Please note that now there is not "AAA" its just AAA. I would need the script to parse in both the situations. It is currently throwing "pyparsing.ParseException: Expected ")"" – Jai Pandit Aug 13 '16 at 02:19
1
def foo(s):
    data={"kwargs":[],"args":[]}
    for item in s:
      if "=" in item: data['kwargs'].append(item)
      else: data['args'].append(item)
    return data

s = u'(鞋子="AAA", last = "BBB", abcd)'
s = s[1:-1] # get rid of the parenthesis
print foo(s)
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179