4

I'm trying to find a way to pass a string (coming from outside the python world!) that can be interpreted as **kwargs once it gets to the Python side.

I have been trying to use this pyparsing example, but the string thats being passed in this example is too specific, and I've never heard of pyparsing until now. I'm trying to make it more, human friendly and robust to small differences in spacing etc. For example, I would like to pass the following.

input_str = "a = [1,2], b= False, c =('abc', 'efg'),d=1"

desired_kwargs = {a : [1,2], b:False, c:('abc','efg'), d:1}

When I try this code though, no love.

from pyparsing import *

# Names for symbols
_quote = Suppress('"')
_eq = Suppress('=')

# Parsing grammar definition
data = (                        
        delimitedList(                   # Zero or more comma-separated items
            Group(                       #   Group the contained unsuppressed tokens in a list
                Regex(u'[^=,)\s]+') +    #     Grab everything up to an equal, comma, endparen or whitespace as a token
                Optional(                #     Optionally...
                    _eq +                #       match an = 
                    _quote +             #       a quote
                    Regex(u'[^"]*') +    #       Grab everything up to another quote as a token
                    _quote)              #       a quote
                )                        #   EndGroup - will have one or two items.
            ))                           # EndList
              

def process(s):
    items = data.parseString(s).asList()
    args = [i[0] for i in items if len(i) == 1]
    kwargs = {i[0]:i[1] for i in items if len(i) == 2}
    return args,kwargs


def hello_world(named_arg, named_arg_2 = 1, **kwargs):
    print(process(kwargs))
    
hello_world(1, 2, "my_kwargs_are_gross = True, some_bool=False, a_list=[1,2,3]")

#output: "{my_kwargs_are_gross : True, some_bool:False, a_list:[1,2,3]}"

Requirements:

  1. The '{' and '}' will be appended on the code side.
  2. Only standard types / standard iterables (list, tuple, etc) will be used in the kwargs-string. No special characters that I can think of...
  3. The kwargs-string will be like they are entered into a function on the python side, ie, 'x=1, y=2'. Not as a string of a dictionary.
  4. I think its a safe assumption that the first step in the string parse will be to remove all whitespace.
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
keynesiancross
  • 3,441
  • 15
  • 47
  • 87
  • you're right, my answer won't work with the lists, I've deleted it. What you were doing makes more sense. What if, to find the key-val pairs, you go backwards on the string, keep storing the chars as a value, once you find an equal sign, whatever is between the equal sign and the first comma you'd encounter is the key. Repeat. – UdonN00dle Jan 17 '23 at 12:28
  • Yeah, maybe... Let me try that. Seems cleaner than some nasty regex statement or whatever – keynesiancross Jan 17 '23 at 13:15
  • 4
    `**kwargs` isn't really relevant to the question. You just want to parse the given string as a series of key/value pairs in order to define a dict. What you'll do with that dict doesn't matter. – chepner Jan 18 '23 at 23:38
  • So what's the UI for this? Toml KV pairs are quite forgiving, but require a newline between them. I don't think it has tuples but it has arrays – Brady Dean Jan 18 '23 at 23:39
  • Straight string, single line. And yup you’re right, @chepner, it’d just key val stripping. – keynesiancross Jan 19 '23 at 01:28

4 Answers4

11

One option could be to use the ast module to parse some wrapping of the string that turns it into a valid Python expression. Then you can even use ast.literal_eval if you’re okay with everything it can produce:

>>> import ast
>>> kwargs = "a = [1,2], b= False, c =('abc', 'efg'),d=1"
>>> expr = ast.parse(f"dict({kwargs}\n)", mode="eval")
>>> {kw.arg: ast.literal_eval(kw.value) for kw in expr.body.keywords}
{'a': [1, 2], 'b': False, 'c': ('abc', 'efg'), 'd': 1}
Ry-
  • 218,210
  • 55
  • 464
  • 476
3

Since the format of your input string is already a valid Python argument list, you don't have to reinvent the wheel with pyparsing but can simply enclose the string in a dict constructor for eval to create the desired kwargs:

desired_kwargs = eval(f'dict({input_str})')

However, evaluating a string from an outside world comes with the security risk of code injection. Since any actual harm can only be done by making a function call, an easy way to avoid the security risk is to parse the code with ast.parse and use ast.walk to invalidate the AST if it contains more than one ast.Call node (there has to be exactly one ast.Call node since we are making a call to the dict constructor):

import ast

code = f'dict({input_str})'
assert sum(isinstance(node, ast.Call) for node in ast.walk(ast.parse(code))) == 1
desired_kwargs = eval(code)

Demo: https://replit.com/@blhsing/OrnateScarceShelfware

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • 1
    Thanks, I’ll give that a shot! I didn’t realize f strings could work like that – keynesiancross Jan 19 '23 at 03:07
  • 5
    I'm not at all convinced that this avoids security risks. For example I could use `input_str = '[print := exec] * 0'` and then the next time you call `print`, you actually call `exec`. That doesn't seem safe. – Kelly Bundy Jan 20 '23 at 00:16
3

You already have some good answers (much easier than this one) if the string you are being passed is well-behaved Python. But if you don't trust the input and/or want to define something a little different, then being explicit about the format you expect may be desirable. In that case, pyparsing is quite useful and readable. The grammar from the question you linked isn't complex enough to handle all your cases, but if you break your grammar out into its constituent elements it is relatively easy to build:

from pyparsing import *

string_arg = QuotedString("'", esc_char="\\", unquote_results=False) | QuotedString("\"", esc_char="\\", unquote_results=False)

number_arg = Word(nums) | Word(nums) + "." + Word(nums)

boolean_arg = Literal("True") | Literal("False")

array_item = string_arg | number_arg
array_list = delimitedList(array_item)
array_arg = Literal("[") + array_list + Literal("]")
tuple_arg = Literal("(") + array_list + Literal(")")

arg_name = Word(identchars, identbodychars)
arg_value = string_arg | number_arg | boolean_arg | tuple_arg | array_arg
arg_item = arg_name + Literal("=").suppress() + arg_value
arg_list = delimitedList(arg_item)

def parseActionValue(string, location, tokens):
    emit_tokens = []
    if tokens[0] == '[':
        emit_tokens = [eval('['+','.join(tokens[1:-1])+']')]
    elif tokens[0] == '(':
        emit_tokens = eval('('+','.join(tokens[1:-1])+')')
    else:
        emit_tokens = eval(tokens[0])
    return emit_tokens

arg_value.setParseAction(parseActionValue)

def construct_args(s):
    arr = arg_list.parse_string(s, parse_all=True)
    args = {}
    for i in range(0,len(arr),2):
        args[arr[i]] = arr[i+1]
    return args

Where you want to do something a little different or do verification that the tokens look like you expect, you add another setParseAction on the element that you want to work with and emit the Python objects you want in the dict.

ricardkelly
  • 2,003
  • 1
  • 1
  • 18
0

python-makefun provides can parse these sorts of strings and may be useful for whatever the use case of the original question is:

import inspect
import makefun


def process_signature(sig: str) -> dict:
    sig = f"f({sig})"
    f = makefun.create_function(sig, (lambda: None))
    result = {}
    for name, arg in inspect.signature(f).parameters.items():
        result[name] = arg.default
    return result


process_signature("a = [1,2], b= False, c =('abc', 'efg'),d=1")

That outputs the desired result: {'a': [1, 2], 'b': False, 'c': ('abc', 'efg'), 'd': 1}

Lucas Wiman
  • 10,021
  • 2
  • 37
  • 41