I am new to Python. I want to merge two DataFrames, I used merge and it get too large with duplicates. I created a sample code for what I did. is there any better performance? I really appreciate your idea.
import pandas as pd
data = [['t', 10, 5], ['n', 15, 5], ['j', 14, 66],['t', 10, 8], ['n', 15, 55]]
df1 = pd.DataFrame(data, columns=['Name', 'Age', "HH"])
data = [['t', 10, 100], ['n', 15, 101], ['j', 14, 102],['t', 10, 81], ['n', 15, 81]]
df2 = pd.DataFrame(data, columns=['Name', 'Age', "year"])
res= pd.merge(df2, df, on=['Name',"Age"], how = "inner")
result :
Name Age year HH
0 t 10 100 5
1 t 10 100 8
2 t 10 81 5
3 t 10 81 8
4 n 15 101 5
5 n 15 101 55
6 n 15 81 5
7 n 15 81 55
8 j 14 102 66
I also used JOIN, but it didn't help and provided the same result.
df1.set_index(['Name','Age'],inplace=True)
df2.set_index(['Name','Age'],inplace=True)
df2.join(df1)