0

Suppose I have two data frames:

a = {'col1': ['value_1'], 'col2': ['value_4']}
df_a = pd.DataFramed = pd.DataFrame(data=a)

b = {'col3': [['value_1', 'value_2']], 'col4': ['value_6']}
df_b = pd.DataFramed = pd.DataFrame(data=b)

I want to then merge the two data frames on columns col1 and col3, if the value in col1 is in the list for col3.

The expected result is

>>> df_merged
      col1     col2                    col3       col4
0  value_1  value_4   ['value_1', 'value_2']  'value_6'

I am able to deconstruct the list, by getting the list by value:

ids = df_b.iloc[0]['col3']]

and then I can iterate over the list, and insert the list values into new columns in df_b, etc., and then I continue on by doing multiple merges, etc., etc., but this is ugly and seems very arbitrary.

Thus, I am looking for a clean and "pythonic" (read as elegant and generalized) way of doing the merge.

horcle_buzz
  • 2,101
  • 3
  • 30
  • 59

1 Answers1

1

I end with using unnesting method flatten your df_b, then do merge

s=unnesting(df_b,['col3']).reset_index()

newdf=df_a.merge(s[['col3','index']],left_on='col1',right_on='col3',how='left').drop('col3',1)
newdf.merge(df_b,left_on='index',right_index=True,how='left')
      col1     col2  index                col3     col4
0  value_1  value_4      0  [value_1, value_2]  value_6
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 1
    Looks promising. I'm getting a `count < 0` error on the line `idx=df.index.repeat(df[explode[0]].str.len())`, since I think that because some values for the column are `NaN`. I will try filtering those out first, since they aren't necessary for this. – horcle_buzz Feb 26 '19 at 01:48
  • By Jove! It works (granted, the values of `col3` are actually dictionaries, so I still need to get the value from them, but this is huge help!)! – horcle_buzz Feb 26 '19 at 01:54
  • 1
    Worked like a charm: I was able to extract the value from. the dictionary by using this list comprehension `df1 = pd.DataFrame([x for x in s['col3']])` and then I merged `df` back with `s` to get what I needed. – horcle_buzz Feb 26 '19 at 02:06