pattern match get list and dict from string

Question

I have string below,and I want to get list,dict,var from this string. How can I to split this string to specific format?

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'

import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
    print('m1:',i)

I only get result 1 correctly. Does anyone know how to do?

m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],

Why cant you simply split on comma and then split individual strings to get, list, vars and dict? — Vivek Khurana, Nov 19 '19 at 03:17
you mean [list_c,=,1,2....etc] then recursive to handle this? — 黃瀚嶙, Nov 19 '19 at 03:20
What is your end goal with what you are trying to do? Typically, when it comes to wanting to convert strings to variables, it would be a better approach to actually use a dictionary instead. [This](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) should give a good explanation. Or even, how are you getting this data? Can it be structured differently to help your solution? — idjaw, Nov 19 '19 at 03:35
@VivekKhurana There are stray commas in the lists and in the dictionary. — DYZ, Nov 19 '19 at 03:37
@idjaw that doesn't matter someone has already answer my question below. — 黃瀚嶙, Nov 19 '19 at 04:02

score 2 · Answer 1 · answered Nov 19 '19 at 03:50

Use '=' to split instead, then you can work around with variable name and it's value.

You still need to handle the type casting for values (regex, split, try with casting may help).

Also, same as others' comment, using dict may be easier to handle

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []

for a in al[1:-1]:
  var_l.append(a.split(',')[-1])
  value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])

output = dict(zip(var_l, value_l))
print(output)

I think split '=' is the keyword in this problem!!Thanks!! – 黃瀚嶙 Nov 19 '19 at 08:35 — 黃瀚嶙, Nov 19 '19 at 08:35

score 1 · Answer 2 · answered Nov 19 '19 at 03:52

You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:

re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'), 
#  ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]

thanks you so much!! BTW,if has some value without '=' how to handle this situation like s = 'list_c=[1,2],Record,Save' — 黃瀚嶙, Nov 19 '19 at 04:00

score 0 · Answer 3 · answered Nov 19 '19 at 04:32

The answer is like below

import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
temp_d = {}
for i,j in m1:    
    temp = i.strip(',').split(',')       
    if len(temp)>1:
        for k in temp[:-1]:
            temp_d[k]=''
        temp_d[temp[-1]] = j
    else:
        temp_d[temp[0]] = j
pprint(temp_d)

Output is like

{'Record': '',
 'Save': '',
 'a': '3',
 'b': '1.3',
 'c': 'abch',
 'dict_a': '{a:2,b:3}',
 'list_a': '[1]',
 'list_c': '[1,2]'}

score 0 · Answer 4 · answered Nov 19 '19 at 06:10

Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):

regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]

This gives a list of all the identifiers in the string:

['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']

We can now define a function to sequentially chop up s using the above list to partition the string sequentially:

def chop(mystr, mylist):
    temp = mystr.partition(mylist[0])[2]
    cut = temp.find(mylist[1])           #strip leading bits
    return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
    mystr, mylist = chop(mystr, mylist)
    temp.append(mystr)

This (convoluted) slicing operation gives this list of strings:

['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',         
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']

Now cut off the ends using each successive entry:

result = []
for x in range(len(temp) - 1):
    cut = temp[x].find(temp[x+1]) - 1    #-1 to remove commas
    result.append(temp[x][:cut])
result.append(temp.pop())                #get the last item

Now we have the full list:

['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']

Each element is easily parsable into key:value pairs (and is also executable via exec).

pattern match get list and dict from string

4 Answers4