5

I have two DataFrames, one containing a column with lists inside its cells. Here is an example:

DF 1 :
   | A      B
---+----------------------------
0  | 'A'    ['A', 'B']
1  | 'B'    ['B', 'D']
2  | 'C'    ['D', 'E', 'F']

DF 2 :
   | C      D
---+----------------------------
0  | 'A'    'X'
1  | 'B'    'Y'
2  | 'C'    'Z'

Here is the code to setup the DataFrames :

df1 = pd.DataFrame({'A': ["A", "B", "C"], "B": [["A", "B"], ["B", "D"], ["D", "E", "F"]]})
df2 = pd.DataFrame({'C': ["A", "B", "C"], "D": ["X", "Y", "Z"]})

I would like do an inner join between DF1 and DF2 with the condition DF2.C in DF1.B, here is the result I expect:

DF1&DF2 :
   | A      B              C      D
---+--------------------------------------
0  | 'A'    ['A', 'B']     'A'    'X'
1  | 'A'    ['A', 'B']     'B'    'Y'
2  | 'B'    ['B', 'D']     'B'    'Y'

I read the documentation explaining how to achieve a join using concat, but I cannot find how to use membership testing as a join condition.

Did I missed something? Any idea on how to do it?

Char siu
  • 159
  • 1
  • 12

1 Answers1

6

This is unnesting problem first then a merge issue

df3=unnesting(df1,['B'])
df3.merge(df2,left_on='B',right_on='C',how='inner').drop('B',1).merge(df1)
Out[15]: 
   A  C  D       B
0  A  A  X  [A, B]
1  A  B  Y  [A, B]
2  B  B  Y  [B, D]

Self-Define function

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234