Appending pandas dataframe issue

Question

I have a TWO dataframes which has 2100 rows × 857 columns. I want to append the 2nd one to the first one.

I use X_train_features = X_train_features.append(X_train_Specfeatures, ignore_index= True)
for this. But instead of getting 4200 rows x 857 columns I get a dataframe of 4200 rows x 1714 columns.

Check out below images.

This is first dataframe.

This is the 2nd one.

The output I get by appending is

I am not able to understand what is wrong.

This usually happens when the column names are different, can you please include `X_train_features.info()` and `X_train_Specfeatures.info()` ? copy the text and paste into a code block — RichieV, Sep 04 '20 at 14:24

score 2 · Answer 1 · answered Sep 04 '20 at 14:27

2

In Fact what you want is to concatenate the two data frames.

You can use pd.concat()

pd.concat([first_df,second_df],axis=0)

answered Sep 04 '20 at 14:27

Ala Bayoudh

21
1

score 1 · Answer 2 · answered Sep 04 '20 at 14:23

1

the usual way to merge two dataframe is to use pandas .concat() function. You must then specify axis = 0 to merge the dataframe according to the label of the columns:

df1=pd.DataFrame({"a":[1,2,3],"b":[4,5,6]})
df2=pd.DataFrame({"a":[7,8,9],"b":[10,11,12]})
pd.concat([df1,df2],axis=0)

answered Sep 04 '20 at 14:23

Raphaël Gervillié

378
2
8

if the problem is that the columns are not labeled the same in both dfs, then `concat` would have the same effect as `append` wouldn't it? – RichieV Sep 04 '20 at 14:28
@RichieV if your columns are not named the same thing, you need to correct that. – Paul H Sep 04 '20 at 14:29
@PaulH that is precisely my point, the OP seems to be having that problem – RichieV Sep 04 '20 at 14:30
you can then give the same labels to your df2 for example like this: df2.columns = df1.columns but make sure that the columns are in the right order. – Raphaël Gervillié Sep 04 '20 at 14:33
1

@RichieV Oops. Sorry. I thought you were the OP – Paul H Sep 04 '20 at 14:35

score 1 · Answer 3 · answered Sep 04 '20 at 14:40

Perhaps you can solve your specific problem with

X_train_Specfeatures.columns = X_train_features.columns

Background

As mentioned in the comments, that usually happens when the column labels are not the same for both dfs.

Take these two dfs

df = pd.DataFrame([[0, 1], [2, 3]])
df2 = df.copy()

If you append (or concat, all the same), you will get a 4x2 df because the column labels are exactly the same.

# df_out = df.append(df2, ignore_index=True)
df_out = pd.concat([df, df2])

print(df_out)

   0  1
0  0  1
1  2  3
2  0  1
3  2  3

But if you change the column names in one df you will get a 4x4 df, because pandas tries to align the column labels.

df2.columns = ['0', '1']

# df_out = df.append(df2, ignore_index=True)
df_out = pd.concat([df, df2], ignore_index=True)

print(df_out)

     0    1    0    1
0  0.0  1.0  NaN  NaN
1  2.0  3.0  NaN  NaN
2  NaN  NaN  0.0  1.0
3  NaN  NaN  2.0  3.0

Notice even though the column names are printed the same, they are actually different values (in one df 0 is an integer and in the other it is a string). So pandas interprets them as different columns, and since the second df has no values for the first column, then it fills with NaN.

You can read more in this question about Pandas Merging 101

Appending pandas dataframe issue

3 Answers3