I have two files which have 3 common columns - Date, KeywordId, AdGroupId. I want to merge these two files based on these columns such that for each row with a particular date, keywordid, adgroupid in the first file, if there's corresponding row with same date, keywordid, adgroupid in the second file, then append the rest of the values present in the second file and if not just append null or - in the rest of the columns.
The first file (df1 here) has 59,00,000 rows. The second file has around 10,00,000 rows. I used the code below
import pandas as pd
df1 = pd.read_csv(r"C:\Users\Rakshit Lal\Desktop\QVC Data\psnb_extract_daily\Final\cumulative_adwords_test.csv")
df2 = pd.read_csv(r"C:\Users\Rakshit Lal\Desktop\QVC Data\psnb_extract_daily\Final\Test_psnbfull.csv")
# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = ['Date', 'KeywordId', 'AdGroupId'])
df3.set_index('Date', inplace = True)
# Write it to a new CSV file
df3.to_csv('CSV3.csv')
# Write it to a new CSV file
df3.to_csv('CSV3.csv')
My final file - csv3 contains only 6,05,277 rows for some reason where it should contain 59,00,000 rows (as in file 1). I believe I'm making an error with the way I'm using the merge function. Can someone help me out on where I'm going wrong and what I should modify?