0

I have two panda large dataframes that I would like to merge using the second column. The issue is that the first dataframe is longer than the second dataframe, so there are some names that won't be found in the second column. I would like to merge the two dataframes such that the rows that can't be merged just stay as they are and the ones that can be, the column 1 values are listed in a comma separated list. I have two short examples of the type of dataframe I am working with as well as the desired result.

DF 1

col 1   col 2
1       Johnny   
7       Cathy
4       Becky 
2       Sarah 
33      Courtney
83      Avery
78      Adam

DF 2

col 1   col 2
12      Johnny   
3       Cathy
13      Becky 
15      Sarah 
55      Adam

Desired result:

col 1      col 2
1, 12      Johnny   
7, 3       Cathy
4, 13      Becky 
2, 15      Sarah 
33         Courtney
83         Avery
78, 55     Adam

1 Answers1

2

If you don't need that specific row order in the output dataframe, you can just do this:

import pandas as pd

df1 = pd.DataFrame(
    {'col1': [1,7,4,2,33,83,78],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Courtney', 'Avery', 'Adam']}
)
df2 = pd.DataFrame(
    {'col1': [12,3,13,15,55],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Adam']}
)

df = pd.concat([df1, df2], axis=0).groupby('col2').agg(list).reset_index()
print(df)

#   col2    col1
# 0 Adam    [78, 55]
# 1 Avery   [83]
# 2 Becky   [4, 13]
# 3 Cathy   [7, 3]
# 4 Courtney    [33]
# 5 Johnny  [1, 12]
# 6 Sarah   [2, 15]
AlexK
  • 2,855
  • 9
  • 16
  • 27
  • What if I no longer want to aggregate the numbers into a list (instead keep them as separate columns? – user14057357 Jul 02 '22 at 06:51
  • @user14057357 You mean split these lists into two columns? You can pick an answer that you like in [this](https://stackoverflow.com/q/35491274/9987623) thread. – AlexK Jul 02 '22 at 07:48