1

How can I join these two data frames using date column without having duplicated many rows:

data = {'date':['01/01/2018', '02/02/2019', '01/04/2019', '16/02/2019','20/03/2019'], 'Age':[20, 21, 19, 18,34]} 

# Create DataFrame 
df1 = pd.DataFrame(data) 
df1

DF2

data2 = {'date':['01/01/2018', '04/07/2019', '01/04/2019', '18/02/2018'], 'miles':[50, 81, 99, 109]} 

# Create DataFrame 
df2 = pd.DataFrame(data2) 
df2

Final result should look like this:

finaldata = {'date':['01/01/2018', '02/02/2019', '01/04/2019', '16/02/2019','20/03/2019'], 'Age':[20, 21, 19, 18,34], 'miles':[50, 'NAN', 99, 'NAN', 'NAN']} 

# Create DataFrame 
final_df = pd.DataFrame(finaldata) 
final_df

I have tried this code on my datasets but it creates so many duplicated rows

df1.merge(df2)

LivingstoneM
  • 1,088
  • 10
  • 28

1 Answers1

2

Use the "how" argument:

df1.merge(df2, how='left')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

Dan
  • 1,575
  • 1
  • 11
  • 17