Regex sub() method seems to replace every single character by the replacement group

Question

In the following dictionary, a user can refer to a key as a variable for defining another value:

d = {'a_key': 'a_value', 'b_key': '[ a_key ]+1/[a_key]'}

I have to replace these references by the corresponding values, in order to obtain this desired output:

d = {'a_key': 'a_value', 'b_key': 'a_value+1/a_value'}

The references are delimited by squared brackets in order to prevent unwanted replacements (it's a safer version than the simple str replacement asked at Replace key references by corresponding values in dict values). Thus I use a regex for performing the replacement:

from re import search, sub
d = {'a_key': 'a_value', 'b_key': '[ a_key ]+1/[a_key]'}
for k in d.keys():
    parameter = search(r"\[\s*(\w+)\s*\]", d[k])
    if parameter is not None and parameter.group(1) in d.keys():
        print("target: "+parameter.group())
        print("to be replaced by: "+d[parameter.group(1)])
        d[k] = sub(parameter.group(), d[parameter.group(1)], d[k])
print(d)

The output is:

target: [ a_key ]
to be replaced by: a_value
{'a_key': 'a_value', 'b_key': '[a_valuea_valuea_valuea_valuea_valuea_valuea_value]+1/[a_valuea_valuea_valuea_valuea_value]'}

Despite the target has been acquired and the replacement group is correct, the squared brackets are still there and, between them, each single character has been replaced by the replacement group. What's wrong with my regex and how to get the desired outpout ?

EDIT:

Thanks to Joshua Varghese's answer, I have to precise that there might not only be key references between squared brackets. For instance:

d = {'a_key': 'a_value', 'b_key': '[ a_key ]+1/[a_key]+[another_thing ]'}

Here I don't want [another_thing] to be replaced. By trying with this example, I found that not all characters are replaced. However among the replaced ones, there are white spaces and any character contained in the key. [another_thing] will become [a_valuenotha_valuera_valuethinga_value]

EDIT2:

Thanks to WeavingBird1917's comment, I will try to use something like the code below instead of setting the code in a for _ in d: loop. However, since dict are unordered, I don't know how to complete the recursive function. Any help appreciated.

from re import search, sub
d = {'a_key': '[c_key]', 'b_key': '1', 'c_key': '[b_key] + [e_key]*[another_thing ]', 'd_key': '[b_key]', 'e_key': '[b_key]'}

def rec(z):
    parameter = search(r"\[\s*(\w+)\s*\]", d[z])
    if parameter is not None and parameter.group(1) in d.keys():
        rec(parameter.group(1))
    else:
        print("+1")
        for k in d:
            d[k] = sub(r"\[\s*(\w+)\s*\]", lambda match: "(" + d[match.group(1)] + ")" if match.group(1) in d else match.group(), d[k])
        # need to go to the next key or to break if no other, but dict are unordered

rec(list(d.keys())[0])
print(d)

Are circular/chained references possible? Can c_value refer to b_value, and b_value refer to a_value, etc.? — , May 16 '20 at 15:59
@WeavingBird1917 Chained references are allowed. Circular references probably won't be, since they could loop for a while. — someone, May 17 '20 at 07:17
@WeavingBird1917 You're right. Thanks. I assume `n` is the number of keys. Will I have to set the whole piece code in this kind of loop: `for _ in d:`? It seems to work but, as you said, it is not optimized in terms of number of iterations. — someone, May 18 '20 at 06:52
@WeavingBird1917 I edited the question with a piece of the recursive function. However, how to deal with the lack of order in dict ? — someone, May 18 '20 at 09:11

Joshua Varghese · Accepted Answer · 2020-05-16T15:39:20.063

1

I believe that the grouping created creates the problem as the text got splited due to grouping.
So replacing that with the pattern without group:

from re import search, sub
d = {'a_key': 'a_value', 'b_key': '[ a_key ]+1/[a_key]'}
for k in d.keys():
    parameter = search("\[\s*(\w+)\s*\]", d[k])
    if parameter is not None and parameter.group(1) in d.keys():
        print("target: "+parameter.group())
        print("to be replaced by: "+d[parameter.group(1)])
        d[k] = sub("\[\s*\w+\s*\]", d[parameter.group(1)], d[k])
print(d)

EDIT
Here is the solution if:

d = {'a_key': 'a_value', 'b_key': '[ a_key ]+1/[a_key]+[another_thing ]'}

here, we'll use a method adopted from here:

for k in d.keys():
    d[k] = sub("\[\s*(\w+)\s*\]", lambda match: d[match.group(1)] if match.group(1) in d else match.group(), d[k])

or simply:

for k in d.keys():
    d[k] = sub("\[\s*(\w+)\s*\]", lambda match: d.get(match.group(1), match.group()), d[k])

gives:

{'a_key': 'a_value', 'b_key': 'a_value+1/a_value+[another_thing ]'}

edited May 16 '20 at 15:39

answered May 16 '20 at 13:51

Joshua Varghese

5,082
1
13
34

Thanks for your answer. Your method will work if there are only key references between squared brackets. If there are also other references, they will be replaced as well. That's what I tried to avoid with the 'if' condition. Anyway I should have been more precise and I will edit my question accordingly. – someone May 16 '20 at 14:26
1

I saw a comment just before but it has disappeared. Well, the use of sub() is not mandatory. Any method that will give the desired output in the general case is welcomed. – someone May 16 '20 at 14:40
It works. Thanks. I suggest you edit your answer before I accept it: it is better to replace `d[match.group(1)]` by `"("+d[match.group(1)]+")"` for treating cases like, for instance, `'c_key': '2*[ b_key ]'`,. – someone May 17 '20 at 07:26
@someone that edit doesnt answer the question :) That is an edit u an use to modify my answer! For now i've answered to the question :D – Joshua Varghese May 17 '20 at 08:47
I don't think it is forbidden to give a more general answer than the required one. I even think it can be useful to other people who, like me, didn't think at first about the possible occurence of such a problem. Anyway, I can accept the answer as is. Thanks again. – someone May 17 '20 at 09:14

score 1 · Answer 2 · answered May 18 '20 at 09:12

Since multiple chained references are allowed (see comments), here is a solution which works recursively. It might be improved by keeping track a set of visited keys in order to avoid calling process_value again inside get_reference.

import re    

def get_reference(_dict, match):
    reference = match.group(1)
    if reference in _dict:
        return process_value(_dict, reference)
    else:
        return match.group(0)

def process_value(_dict, key):
    new_value = re.sub("\[\s*(\w+)\s*\]", 
                       lambda match: get_reference(_dict, match), _dict[key])
    _dict[key] = new_value

    return new_value

def process_dict(_dict):
    for key in _dict:
        process_value(_dict, key)

Example input/output:

example_dict = dict(a_key="[c_key ]", b_key="1", c_key="[b_key] + [ e_key ]",
      d_key="[b_key]", e_key="[b_key]", f_key="3", g_key="2 + [h_key]", 
      h_key="[b_key] / [ k_key ]")

process_dict(example_dict)

print(example_dict)
# Output: 
# {'a_key': '1 + 1',
#  'b_key': '1',
#  'c_key': '1 + 1',
#  'd_key': '1',
#  'e_key': '1',
#  'f_key': '3',
#  'g_key': '2 + 1 / [ k_key ]',
#  'h_key': '1 / [ k_key ]'}

Thanks. It works. I will use `r"\[\s*(\w+)\s*\]"` instead of `"\[\s*(\w+)\s*\]"` in order to prevent a warning from my IDE, and `return "("+new_value+")"` instead of `return new_value` for dealing with potential issues I mentionned in the comments of Joshua Varghese's answer. — someone, May 18 '20 at 10:44

Regex sub() method seems to replace every single character by the replacement group

2 Answers2