0

I working on merging to different a CSV file as:

input a.csv:

myID name  day age
111  james Feb 29
112  alexs Mar 19
113  kevin Jan 18
114  bratt Jul 22
115  nancy Apr 30

Input b.csv:

myID visitedCity
111  boston
111  san fran
111  paris
112  texas
113  fargo
114  austin   
114  los alto
115  memphis

As it shows the file b.CSV has the same myID column names that used in a.CSV and contains similar id's, however, file b.CSV contains duplicated ID's which shows how many cities visited.
Question is: how I can merge the data from the input a.CSV to the b.CSV like follows:

myID visitedCity name  day age
111  boston      james Feb 29
111  san fran    james Feb 29
111  paris       james Feb 29
112  texas       alexs Mar 19
113  fargo       kevin Jan 18
114  austin      bratt Jul 22
114  los alto    bratt Jul 22
115  memphis     nancy Apr 30

I already checked some posts such as this but it's not what I want.

Bilgin
  • 499
  • 1
  • 10
  • 25

3 Answers3

1

Use the merge command.

>>> import pandas as pandas
>>> adf = pandas.DataFrame([[111, 'james', 'Feb', 29]])
>>> adf.columns = ['myID', 'name', 'day', 'age']
>>> adf
   myID   name  day  age
0   111  james  Feb   29
>>> bdf = pandas.DataFrame([[111, 'boston'], [111, 'san fran']])
>>> bdf.columns = ['myID', 'visitiedCity']
>>> bdf
   myID visitiedCity
0   111       boston
1   111     san fran
>>> df = pandas.merge(bdf, adf, how='outer', on=['myID'], indicator=False)
>>> df
   myID visitiedCity   name  day  age
0   111       boston  james  Feb   29
1   111     san fran  james  Feb   29

More information can be found https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

xdou
  • 66
  • 4
1

DataFrame and Pandas will be the best way to do this. However, I was trying to find out if this can also be done using basic loops. Here's what I was able to do using list comprehension.

I am going to use each row as a string. Then I need to compare only the first 4 characters in each row. Here's how I did it. It is just a two line code.

x = [
'myID name  day age',
'111  james Feb 29',
'112  alexs Mar 19',
'113  kevin Jan 18',
'114  bratt Jul 22',
'115  nancy Apr 30']
y = [
'myID visitedCity',
'111  boston',
'111  san fran',
'111  paris',
'112  texas',
'113  fargo',
'114  austin',
'114  los alto',
'115  memphis']

k = [i.ljust(16,' ')+j[4:] for i in y for j in x if i[:3] == j[:3]]
for x in k: print(x)

Output is as follows:

myID visitedCity name  day age
111  boston      james Feb 29
111  san fran    james Feb 29
111  paris       james Feb 29
112  texas       alexs Mar 19
113  fargo       kevin Jan 18
114  austin      bratt Jul 22
114  los alto    bratt Jul 22
115  memphis     nancy Apr 30 
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

Try

import pandas as pd
aDf= pd.read_csv('a.csv')
bDf =pd.read_csv('b.csv')

ResultDf=pd.merge( bDf, aDf, how='left', on='myID')
Kuldip Chaudhari
  • 1,112
  • 4
  • 8