0

df1:

   name     data
0   C       'nnnn'
1   B       Nan
2   A       'yyyy'
3   D       'zzzz'
4   E       Nan

df2:

   name     data
0   A       'qqqq'
1   B       'vvvv'
2   C       'uuuu'
3   D       Nan
4   G       'pppp'
5   F       'mmmm'

I want the columns where name rows match and those that don't. Matching columns were easily found. pd.merge(df1, df2, on='name')

And referred to the link page. Pandas Merging 101

When I applied the code,

(df1.merge(df2, on='name', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))

got this result...

   name    data_x   data_y
0   C      'nnnn'     Nan
1   A      'yyyy'     Nan
2   E      Nan        Nan
3   G      Nan       'pppp'
4   F      Nan       'mmmm'

result I want. (Exclude matching columns in row name)

   name     data_x    data_y
0   E       Nan        Nan
1   G       Nan        'pppp'
2   F       Nan        'mmmm'

'name' in df1 and df2 mostly match each other but are mixed. I want to find columns that belong only to df1 or df2. searched for a long time, but I don't know how. I am a beginner in python and not good at English too, sorry. help me please.

jy311good
  • 1
  • 1

1 Answers1

1

As mentioned by @BENY, your code actually works just fine:

>>> (df1.merge(df2, on='name', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))
  name data_x data_y
4    E    ...    NaN
5    G    NaN    ...
6    F    NaN    ...

That said, you can also use set differences, especially if you don't actually want to have two columns and remove the NaNs, but instead just want to pinpoint which name values are absent on one side or the other:

miss = set(df1['name']).symmetric_difference(df2['name'])

Then:

>>> df1.loc[df1['name'].isin(miss)]
  name data
4    E  ...

>>> df2.loc[df2['name'].isin(miss)]
  name data
4    G  ...
5    F  ...
Pierre D
  • 24,012
  • 7
  • 60
  • 96