I have a MasterList
dataframe
that I merge other data sets into via a loop. Each time I merge a new column is created as with _x
or _y
. How can I just keep these as one column?
import pandas as pd
MasterList = pd.DataFrame(data = [['0001'],['0002'], ['0003'], ['0004']], columns = ['Order Number'])
customer_file1 = pd.DataFrame(data = [['0003', 'M'], ['0004', 'W']], columns = ['Order Number', 'Day'])
customer_file2 = pd.DataFrame(data = [['0001', 'T'], ['0002', 'S']], columns = ['Order Number', 'Day'])
for x in [customer_file1, customer_file2]:
MasterList = pd.merge(MasterList, x, how='left',left_on= 'Order Number',right_on= 'Order Number')
print MasterList
Output:
Order Number Day_x Day_y
0 0001 NaN T
1 0002 NaN S
2 0003 M NaN
3 0004 W NaN
Desired Output:
Order Number Day
0 0001 T
1 0002 S
2 0003 M
3 0004 W
Edit: People wanted more data as I over simplified my example: I know year and day don't really make sense in the data set buy that is ok. Each customer file really come from a query of a different database, so I'd like to make the query from the database then merge in the data and forget about it rather than querying all the customer databases, concatenating, then merging.
import pandas as pd
MasterList = pd.DataFrame(data = [['0001', '2015'],['0002', '2015'], ['0003', '2016'], ['0004', '2015'], ['0005', '2017'], ['0006', '2018']], columns = ['Order Number', 'Year'])
customer_file1 = pd.DataFrame(data = [['0003', 'M'], ['0004', 'W']], columns = ['Order Number', 'Day'])
customer_file2 = pd.DataFrame(data = [['0001', 'T'], ['0002', 'S']], columns = ['Order Number', 'Day'])
customer_file3 = pd.DataFrame(data = [['0005', 'T'], ['0006', 'S']], columns = ['Order Number', 'Day'])
for x in [customer_file1, customer_file2, customer_file3]:
MasterList = pd.merge(MasterList, x, how='left', left_on='Order Number', right_on='Order Number')
print MasterList
output:
Order Number Year Day_x Day_y Day
0 0001 2015 NaN T NaN
1 0002 2015 NaN S NaN
2 0003 2016 M NaN NaN
3 0004 2015 W NaN NaN
4 0005 2017 NaN NaN T
5 0006 2018 NaN NaN S
Desired output:
Order Number Year Day
0 0001 2015 T
1 0002 2015 S
2 0003 2016 M
3 0004 2015 W
4 0005 2017 T
5 0006 2018 S