Pandas concat resulting in NaN rows?

Question

I have two data frames with the same amount of rows: 1434, and I'd like to concatenate them amongst the axis 1:

res = pd.concat([df_resolved, df1], axis=1)

The two data frames do not have any columns that have the same name. I'd just like to join them like:

df1:        df2:
col1 col2 | col3 col4
1    0    | 9    0
6    0    | 0    0

=
concatenated_df:
col1 col2 col3 col4
1    0    9    0
6    0    0    0

This works fine on a small example like this, but for some reason I end up with many NaN rows if I try it on my original dataset, which is too big for me to oversee (I'm trying to join 1434x24 and 1434x17458 shaped data frames). So the outcome is kinda like:

concatenated_df:
col1 col2 col3 col4
col1 col2 col3 col4
1    0    9    0
6    0    0    0
NaN  NaN  0    0

But I don't see why. Do you have any ideas how this can occur? I've tried renaming all the columns in the smaller data frame by appending a _xyz string to the column names, but the issue stays the same.

https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — BENY, Aug 30 '17 at 15:05

score 8 · Answer 1 · answered Aug 06 '18 at 14:56

8

The answer to a similar question here might help: pandas concat generates nan values

Briefly, if the row indices for the two dataframes have any mismatches, the concatenated dataframe will have NaNs in the mismatched rows. If you don't need to keep the indices the way they are, using df.reset_index(drop=True, inplace=True) on both datasets before concatenating should fix the problem.

answered Aug 06 '18 at 14:56

ada's_human

89
1
4

2

I think you answer should be the accepted one since it is more Pandas related. – Harshdeep Singh May 29 '19 at 23:00

BENY · Accepted Answer · 2017-08-30T15:35:24.670

5

I used to have the same problem , when I generated the training and testing set.This is my solution , However , I do not know why pd.concat does not work in this situation too ...

l1=df.values.tolist()
l2=df_resolved.values.tolist()
for i in range(len(l1)):
    l1[i].extend(l2[i])

df=pd.DataFrame(l1,columns=df.columns.tolist()+df_resolved.columns.tolist())

edited Aug 30 '17 at 15:35

answered Aug 30 '17 at 15:20

BENY

317,841
20
164
234

I love you so much right now! :D – lte__ Aug 30 '17 at 15:23
@lte__ dude , I still I have the same problem with you pd.concat does work ...T_T – BENY Aug 30 '17 at 15:30

Pandas concat resulting in NaN rows?

2 Answers2