Using vectorization, I want to remap a column according to a nested dictionary depending on the value of a second column. In iteration logic, I would go through the rows; depending on the value in the second column, I would choose an entry from the dictionary's first level (which is a dictionary again) according to which the values are mapped.
MWE
I have a data frame with two columns of interest: country
and variable
:
df= pd.DataFrame({"country": ["AA", "BB", "AA", "AA", "BB", "AA"],
"variable": ["foo/tball", "palace", "bla", "car", "bla", "dog"]})
I further have a nested dictionary dct
(values in variable
are only unique in combination with the country
, and I thus cannot unnest the dictionary). dct
contains strings of entries in df['variable']
and substrings thereof:
dct = {"AA": {'foo': 'NEWFOO', # substring of 'foo/tball'
'bla' : 'NEWBLA',
'cart': 'this value is not in the dataframe'}, # sic! -- not substring of any entry
"BB": {'pal': 'NEWPAL', # substring of palace
'bla': 'DIFFERENT_NEWBLA'},
"CC": {"this": "'CC' dictionary can be ignored"}}
I now want to map the entry of df['variable']
according to dct[df['country']]
following these rules:
- if the string matches or if the dictionary key is a substring of the entry in
df['variable']
(e.g.'foo'
indct['AA']
: replace the entry according to the dictionary - otherwise (i.e. if the cell entry is not in the dictionary (e.g.
'dog'
), replace with some predefined value (here,_some_flag_value_
) - ignore entries in the dictionary that do not match either on the
'country'
column (e.g.dct["CC"]
) or the'variable'
column (e.g.'cart'
indct['AA']
).
The desired output should look like this:
out = pd.DataFrame({"country": ["AA", "BB", "AA", "AA", "BB", "AA"],
"variable": ["NEWFOO", "NEWPAL", "NEWBLA", "_some_flag_value_",
"DIFFERENT_NEWBLA", "_some_flag_value_"]})
I have tried using df.mapapply()
with several combinations of lambda
to no avail - can someone point me in the right direction? Thanks a lot in advance.