0

I have two .csv files "train_id.csv" and "train_ub.csv", I want to load them as pandas dataframes. Their dimensions are different, but they have only common column, let's say:

train_id:

ID  id_01   id_02   id_03   id_04   
1   0.0     1.0     5.2     7.1 
2   0.5     7.7     3.4     9.8 
5   1.5     0.8     1.6     2.5
7   3.0     0.2     3.4     6.3
8   5.5     1.8     7.5     7.0
9   7.2     2.6     9.1     1.1
11  9.5     3.5     2.2     0.3

while train_ub:

ID  ub_01   ub_02   ub_03   ub_04   
1   0.0     1.0     9.2     8.3 
2   1.5     2.7     0.4     4.9 
3   2.7     4.8     7.6     3.7
4   4.8     9.2     2.4     5.4
6   6.0     5.8     5.5     0.6
10  9.1     3.6     4.1     2.0
11  7.3     7.5     0.2     9.5     

One may see that they have in common the first column but there are missing values in each dataframe. Is there a way in pandas to merge them column wise in order to get a dataframe of the form:

ID  id_01   id_02   id_03   id_04   ub_01   ub_02   ub_03   ub_04
1   0.0     1.0     5.2     7.1      0.0     1.0     9.2     8.3
2   0.5     7.7     3.4     9.8      1.5     2.7     0.4     4.9
3   NaN     NaN     NaN     NaN      2.7     4.8     7.6     3.7
4   NaN     NaN     NaN     NaN      4.8     9.2     2.4     5.4
5   1.5     0.8     1.6     2.5      NaN     NaN     NaN     NaN
6   NaN     NaN     NaN     NaN      6.0     5.8     5.5     0.6
7   3.0     0.2     3.4     6.3      NaN     NaN     NaN     NaN
8   5.5     1.8     7.5     7.0      NaN     NaN     NaN     NaN
9   7.2     2.6     9.1     1.1      NaN     NaN     NaN     NaN
10  NaN     NaN     NaN     NaN      9.1     3.6     4.1     2.0
11  9.5     3.5     2.2     0.3      9.5     3.5     2.2     0.3

PS: Notice that this is an oversimplified example, the real databases have the shapes id(144233, 41) and ub(590540, 394).

1 Answers1

0

You could accomplish this using an outer join. Here is the code for it:

train_id = pd.read_csv("train_id.csv")
train_up = pd.read_csv("train_up")
train_merged = train_id.merge(train_ub, on=["ID"], how="outer")