Include unmatched rows in Python script that merges two files based on one column

Question

I have two csv files shown below.

First file:

abTestGroup,platform,countryCode,userId

group_control,ios,GB,aaaaaaaaaaa group_control,ios,GB,aaaaaaaaaaa

group_control,ios,GB,aaaaaaaaaaa group_control,ios,GB,aaaaaaaaaaa

group_test,android,GB,ccccccccccc

Second file:

dateActivity,productId,cost,userId

2018-03-02,specialpack,0.198,aaaaaaaaaaa

2018-03-03,specialpack,0.498,aaaaaaaaaaa

2018-03-02,specialpack,0.398,bbbbbbbbbbb

2018-03-02,specialpack,0.998,ccccccccccc

and they have one common thing in this case which is the userId.

I want to merge those files and create a parent-child relationship using Python (Pandas).

I have used the script below:

import pandas as pd

a = pd.read_csv('PARENT.csv', encoding = "UTF-8", mangle_dupe_cols=True, usecols=['abTestGroup','platform','countryCode','userId'])
b = pd.read_csv("CHILD.csv")
merged = b.merge(a, on='userId', how='inner')

merged = merged.drop_duplicates()

merged.to_csv("final_output.csv", index=False)

in order to get the following output:

dateActivity,productId,cost,userId,abTestGroup,platform,countryCode

2018-03-02,specialpack,0.198,aaaaaaaaaaa,group_control,ios,GB

2018-03-03,specialpack,0.498,aaaaaaaaaaa,group_control,ios,GB

2018-03-02,specialpack,0.998,ccccccccccc,group_test,android,GB

The userId 'bbbbbbbbbbb' does not appear in the final output because it doesn't exist in both files. How can I include the unmatched rows (unmatched userIds) and assign the 'Other' value to the cells?

see this on [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — akshat, Apr 26 '18 at 14:16

score 0 · Accepted Answer · answered Apr 26 '18 at 14:11

Because you are doing an inner join, only the matched userIds will be return. In order to return unmatched userIds, you can do a right/left join depending on the sequence of join in the following statement.

merged = b.merge(a, on='userId', how='inner')

If b contains bbbbbbbbbbb you would do a left join. If a contains bbbbbbbbbbb then you would do a right join.

merged = b.merge(a, on='userId', how='left')
merged = b.merge(a, on='userId', how='right')

Please see the join types below.

Yes. I skipped that. I should take more breaks :P SQL was the basic knowledge needed here. — Datacrawler, Apr 26 '18 at 14:45

score 0 · Answer 2 · answered Apr 26 '18 at 14:14

0

join method will work for your case:

a.join(b)

answered Apr 26 '18 at 14:14

akshat

1,219
1
8
24

Include unmatched rows in Python script that merges two files based on one column

2 Answers2