I have two csv files shown below.
First file:
abTestGroup,platform,countryCode,userId
group_control,ios,GB,aaaaaaaaaaa group_control,ios,GB,aaaaaaaaaaa
group_control,ios,GB,aaaaaaaaaaa group_control,ios,GB,aaaaaaaaaaa
group_test,android,GB,ccccccccccc
Second file:
dateActivity,productId,cost,userId
2018-03-02,specialpack,0.198,aaaaaaaaaaa
2018-03-03,specialpack,0.498,aaaaaaaaaaa
2018-03-02,specialpack,0.398,bbbbbbbbbbb
2018-03-02,specialpack,0.998,ccccccccccc
and they have one common thing in this case which is the userId
.
I want to merge those files and create a parent-child relationship using Python (Pandas).
I have used the script below:
import pandas as pd
a = pd.read_csv('PARENT.csv', encoding = "UTF-8", mangle_dupe_cols=True, usecols=['abTestGroup','platform','countryCode','userId'])
b = pd.read_csv("CHILD.csv")
merged = b.merge(a, on='userId', how='inner')
merged = merged.drop_duplicates()
merged.to_csv("final_output.csv", index=False)
in order to get the following output:
dateActivity,productId,cost,userId,abTestGroup,platform,countryCode
2018-03-02,specialpack,0.198,aaaaaaaaaaa,group_control,ios,GB
2018-03-03,specialpack,0.498,aaaaaaaaaaa,group_control,ios,GB
2018-03-02,specialpack,0.998,ccccccccccc,group_test,android,GB
The userId
'bbbbbbbbbbb' does not appear in the final output because it doesn't exist in both files. How can I include the unmatched rows (unmatched userIds
) and assign the 'Other' value to the cells?