1

I have 2 dataframes:

ID   LIST_VALUES
 1     [a,b,c]
 2     [a,n,t]
 3     [x]
 4     [h,h]


VALUE     MAPPING
 a         alpha
 b         bravo
 c         charlie
 n         november
 h         hotel
 t         tango
 x         xray

I need to add a new column to the first dataframe that shows the VALUES from the second dataframe based on what is in the LIST_VALUES list. If a value is duplicated in LIST_VALUES, only show it once in the output. So:

ID   LIST_VALUES    new_col
 1     [a,b,c]       alpha,bravo,charlie
 2     [a,n,t]       alpha,november,tango
 3     [x]           xray
 4     [h,h]         hotel

I have tried to merge pd.merge but I keep getting stuck as I cannot merge with elements in a list.

df_new = df1.merge(df2, how='left', left_on='LIST_VALUES', right_on='VALUES') 

This will only work where a LIST_VALUE has only 1 element so in this example ID 3. I need it to work where there are multiple values in a list.

bharatk
  • 4,202
  • 5
  • 16
  • 30
Mazz
  • 770
  • 3
  • 11
  • 23

2 Answers2

1

Use list comprehension with map by Series same way like dictionary, last remove duplicated values by dict.fromkeys trick and join values together:

d = df2.set_index('VALUE')['MAPPING']
df1['new_col'] = [', '.join(dict.fromkeys([d[y] for y in x if y in d]).keys()) 
                                                               for x in df1['LIST_VALUES']]

print (df1)
   ID LIST_VALUES                 new_col
0   1   [a, b, c]   alpha, bravo, charlie
1   2   [a, n, t]  alpha, november, tango
2   3         [x]                    xray
3   4      [h, h]                   hotel

If order is not important of new values is possible use set for remove duplicates:

d = df2.set_index('VALUE')['MAPPING']
df1['new_col'] = [', '.join(set([d[y] for y in x if y in d])) for x in df1['LIST_VALUES']]

print (df1)
   ID LIST_VALUES                 new_col
0   1   [a, b, c]   alpha, charlie, bravo
1   2   [a, n, t]  alpha, tango, november
2   3         [x]                    xray
3   4      [h, h]                   hotel
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

One way would be to build a dictionary from your second dataframe using set_index and to_dict. Then use a nested list comprehension to lookup the dictionary using the values in the lists:

d = df2.set_index('VALUE').MAPPING.to_dict()
# {'a': 'alpha', 'b': 'bravo', 'c': 'charlie', ...

df['new_col'] = [','.join([d[j] for j in i]) for i in df.LIST_VALUES]

print(df)

 ID LIST_VALUES                new_col
0   1   [a, b, c]   alpha,bravo,charlie
1   2   [a, b, c]   alpha,bravo,charlie
2   3         [x]                  xray
3   4      [h, h]           hotel,hotel

Setup:

print(df2)

 VALUE   MAPPING
0     a     alpha
1     b     bravo
2     c   charlie
3     n  november
4     h     hotel
5     t     tango
6     x      xray

print(df)

   ID LIST_VALUES
0   1   [a, b, c]
1   2   [a, b, c]
2   3         [x]
3   4      [h, h]
yatu
  • 86,083
  • 12
  • 84
  • 139