Parse string to identify kwargs and args

Question

I am new to python and looking for a elegant way to do the below job.

I have a string say:

s = u'(鞋子="AAA", last = "BBB", abcd)'

I am thinking of a function which can parse the above string and give output in the following format.

arg, kwarg = foo(s)

def foo():
    # the implementation I dont know.

How shall I perform this in python?

Why would you need to do this? There might be a better way to go about it, can you explain more? — Joff, Aug 05 '16 at 23:57
This string is a user input. The user input should support chinese characters too. Now I want to parse the input and figure out what keyword arguments are supplied. — Jai Pandit, Aug 05 '16 at 23:59
I am wondering if I can write a python function which takes kwargs as params and the argument can have a chinese character as key. Right now it says invalid syntax — Jai Pandit, Aug 06 '16 at 00:00
What says invalid syntax? Your foo method here doesn't accept a parameter, so that's the first problem with the code you have shown — OneCricketeer, Aug 06 '16 at 00:02
doing so gives a invalid syntax, So I am thinking of how can I parse the string to obtain the arg and kwargs. — Jai Pandit, Aug 06 '16 at 00:09

Mark Tolonen · Accepted Answer · 2016-08-06T01:16:04.733

A nice way to parse a string that follows some grammar rules is the 3rd party pyparsing library. This is very generic lacking a formal grammar definition of allowed user input:

#coding:utf8
from pyparsing import *

# Names for symbols
_lparen = Suppress('(')
_rparen = Suppress(')')
_quote = Suppress('"')
_eq = Suppress('=')

# Parsing grammar definition
data = (_lparen +                        # left parenthesis
        delimitedList(                   # Zero or more comma-separated items
            Group(                       #   Group the contained unsuppressed tokens in a list
                Regex(u'[^=,)\s]+') +    #     Grab everything up to an equal, comma, endparen or whitespace as a token
                Optional(                #     Optionally...
                    _eq +                #       match an = 
                    _quote +             #       a quote
                    Regex(u'[^"]*') +    #       Grab everything up to another quote as a token
                    _quote)              #       a quote
                )                        #   EndGroup - will have one or two items.
            ) +                          # EndList
        _rparen)                         # right parenthesis

def process(s):
    items = data.parseString(s).asList()
    args = [i[0] for i in items if len(i) == 1]
    kwargs = {i[0]:i[1] for i in items if len(i) == 2}
    return args,kwargs

s = u'(鞋子="AAA", last = "BBB", abcd)'
args,kwargs = process(s)
for a in args:
    print a
for k,v in kwargs.items():
    print k,v

Output:

abcd
鞋子 AAA
last BBB

Amazing script Mark. Thanks a lot you saved my day. However i tried to understand and modify it but failed to do it. As this scripts works very well for input = s = u'(鞋子="AAA", last = "BBB", abcd)' but throws parsing exception when input s = u'(鞋子=AAA, last = "BBB", abcd)'. Please note that now there is not "AAA" its just AAA. I would need the script to parse in both the situations. It is currently throwing "pyparsing.ParseException: Expected ")"" — Jai Pandit, Aug 13 '16 at 02:19

Joran Beasley · Answer 2 · 2016-08-06T00:22:12.677

1

def foo(s):
    data={"kwargs":[],"args":[]}
    for item in s:
      if "=" in item: data['kwargs'].append(item)
      else: data['args'].append(item)
    return data

s = u'(鞋子="AAA", last = "BBB", abcd)'
s = s[1:-1] # get rid of the parenthesis
print foo(s)

edited Aug 06 '16 at 00:22

answered Aug 06 '16 at 00:02

Joran Beasley

110,522
12
160
179

`abcd` is not a valid value, though – OneCricketeer Aug 06 '16 at 00:04
This is what exactly i was thinking of but I tried but its not working, it says invalid syntax, because it eventually tries to pass a chinese keyword argument with chinese word as a key. – Jai Pandit Aug 06 '16 at 00:05
arg(abcd) kwarg{'鞋子':'AAA', 'last': BBB} – Jai Pandit Aug 06 '16 at 00:18
is what is needed as output. – Jai Pandit Aug 06 '16 at 00:18
Why do we need to encode the unicode data in your solution ? – Jai Pandit Aug 06 '16 at 00:20
maybe a regex can help solving this problem. – Jai Pandit Aug 06 '16 at 00:24
@JaiPandit Comments are not like a chat room. Please consolidate your comments instead of posting 30 seconds apart with 50 characters each... – Chris Cirefice Aug 06 '16 at 00:25

Parse string to identify kwargs and args

2 Answers2

Linked