3

I have two data frames with the same amount of rows: 1434, and I'd like to concatenate them amongst the axis 1:

res = pd.concat([df_resolved, df1], axis=1)

The two data frames do not have any columns that have the same name. I'd just like to join them like:

df1:        df2:
col1 col2 | col3 col4
1    0    | 9    0
6    0    | 0    0

=
concatenated_df:
col1 col2 col3 col4
1    0    9    0
6    0    0    0

This works fine on a small example like this, but for some reason I end up with many NaN rows if I try it on my original dataset, which is too big for me to oversee (I'm trying to join 1434x24 and 1434x17458 shaped data frames). So the outcome is kinda like:

concatenated_df:
col1 col2 col3 col4
col1 col2 col3 col4
1    0    9    0
6    0    0    0
NaN  NaN  0    0

But I don't see why. Do you have any ideas how this can occur? I've tried renaming all the columns in the smaller data frame by appending a _xyz string to the column names, but the issue stays the same.

lte__
  • 7,175
  • 25
  • 74
  • 131
  • https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – BENY Aug 30 '17 at 15:05

2 Answers2

8

The answer to a similar question here might help: pandas concat generates nan values

Briefly, if the row indices for the two dataframes have any mismatches, the concatenated dataframe will have NaNs in the mismatched rows. If you don't need to keep the indices the way they are, using df.reset_index(drop=True, inplace=True) on both datasets before concatenating should fix the problem.

ada's_human
  • 89
  • 1
  • 4
5

I used to have the same problem , when I generated the training and testing set.This is my solution , However , I do not know why pd.concat does not work in this situation too ...

l1=df.values.tolist()
l2=df_resolved.values.tolist()
for i in range(len(l1)):
    l1[i].extend(l2[i])

df=pd.DataFrame(l1,columns=df.columns.tolist()+df_resolved.columns.tolist())
BENY
  • 317,841
  • 20
  • 164
  • 234