0

I am new to Python. I want to merge two DataFrames, I used merge and it get too large with duplicates. I created a sample code for what I did. is there any better performance? I really appreciate your idea.

import pandas as pd
data = [['t', 10, 5], ['n', 15, 5], ['j', 14, 66],['t', 10, 8], ['n', 15, 55]]

df1 = pd.DataFrame(data, columns=['Name', 'Age', "HH"])
data = [['t', 10, 100], ['n', 15, 101], ['j', 14, 102],['t', 10, 81], ['n', 15, 81]] 


df2 = pd.DataFrame(data, columns=['Name', 'Age', "year"])

res= pd.merge(df2, df, on=['Name',"Age"], how = "inner")



result :

    Name    Age year    HH
0   t       10  100      5
1   t       10  100      8
2   t       10  81       5
3   t       10  81       8
4   n       15  101      5
5   n       15  101      55
6   n       15  81       5
7   n       15  81       55
8   j       14  102      66

I also used JOIN, but it didn't help and provided the same result.


df1.set_index(['Name','Age'],inplace=True)
df2.set_index(['Name','Age'],inplace=True)
df2.join(df1)
user14269252
  • 412
  • 4
  • 15

1 Answers1

1

You were close, after your last attempt, once you have set the new indexes, you can use pd.concat:

df1.set_index(['Name','Age'],inplace=True)
df2.set_index(['Name','Age'],inplace=True)
pd.concat([df1,df2], axis=1)

which outputs:

          HH  year
Name Age          
t    10    5   100
n    15    5   101
j    14   66   102
t    10    8    81
n    15   55    81
Mat.B
  • 336
  • 2
  • 8