You could use str.replace.
Setup
import pandas as pd
df = pd.DataFrame(data=[['ABCD', '00FQ'], ['JKFA', '8LK9|4F5H'], ['QWST', '2RLA|R1T5&8LK9']], columns=['ID', 'Code'])
d = {'00FQ': "['A','B']", '8LK9': "['X']", '4F5H': "['U','Z']", '2RLA': "['H','K']", 'R1T5': "['B','G']"}
def r(w, d=d):
"""Function to be used for dictionary based replacement"""
return d[w.group()]
Code
df['Logic'] = df['Code'].str.replace('[^|&]+', r).str.replace('([|&])', r' \1 ')
print(df)
Output
ID Code Logic
0 ABCD 00FQ ['A','B']
1 JKFA 8LK9|4F5H ['X'] | ['U','Z']
2 QWST 2RLA|R1T5&8LK9 ['H','K'] | ['B','G'] & ['X']
The idea is first to replace everything that is not |
or &
by it's corresponding value in the dictionary (using the function r
). Once this is done replace every |
or &
(using a capturing group) by itself surrounded by spaces (r' \1 ')
.
Notice that in the first call to replace the repl
parameter is a function (callable), this can be done as specified in the linked documentation:
The callable is passed the regex match object and must return a
replacement string to be used. See re.sub().
Note: This solution assumes every possible code is in the dictionary used for replacement, if that is not the case change r
to:
def r(w, d=d):
"""Function to be used for dictionary based replacement"""
return d.get(w.group(), w.group())
For more on regular expressions, see:
- Regular Expression HOWTO
- Speed up millions of regex replacements in Python 3