-2

I have a TWO dataframes which has 2100 rows × 857 columns. I want to append the 2nd one to the first one.

I use X_train_features = X_train_features.append(X_train_Specfeatures, ignore_index= True)
for this. But instead of getting 4200 rows x 857 columns I get a dataframe of 4200 rows x 1714 columns.

Check out below images.

This is first dataframe. dataframe 1

This is the 2nd one. 2nd dataframe

The output I get by appending is End result

I am not able to understand what is wrong.

adikh
  • 306
  • 2
  • 16
  • This usually happens when the column names are different, can you please include `X_train_features.info()` and `X_train_Specfeatures.info()` ? copy the text and paste into a code block – RichieV Sep 04 '20 at 14:24

3 Answers3

2

In Fact what you want is to concatenate the two data frames.

You can use pd.concat()

pd.concat([first_df,second_df],axis=0)
1

the usual way to merge two dataframe is to use pandas .concat() function. You must then specify axis = 0 to merge the dataframe according to the label of the columns:

df1=pd.DataFrame({"a":[1,2,3],"b":[4,5,6]})
df2=pd.DataFrame({"a":[7,8,9],"b":[10,11,12]})
pd.concat([df1,df2],axis=0)
1

Perhaps you can solve your specific problem with

X_train_Specfeatures.columns = X_train_features.columns

Background

As mentioned in the comments, that usually happens when the column labels are not the same for both dfs.

Take these two dfs

df = pd.DataFrame([[0, 1], [2, 3]])
df2 = df.copy()

If you append (or concat, all the same), you will get a 4x2 df because the column labels are exactly the same.

# df_out = df.append(df2, ignore_index=True)
df_out = pd.concat([df, df2])

print(df_out)

   0  1
0  0  1
1  2  3
2  0  1
3  2  3

But if you change the column names in one df you will get a 4x4 df, because pandas tries to align the column labels.

df2.columns = ['0', '1']

# df_out = df.append(df2, ignore_index=True)
df_out = pd.concat([df, df2], ignore_index=True)

print(df_out)

     0    1    0    1
0  0.0  1.0  NaN  NaN
1  2.0  3.0  NaN  NaN
2  NaN  NaN  0.0  1.0
3  NaN  NaN  2.0  3.0

Notice even though the column names are printed the same, they are actually different values (in one df 0 is an integer and in the other it is a string). So pandas interprets them as different columns, and since the second df has no values for the first column, then it fills with NaN.

You can read more in this question about Pandas Merging 101

RichieV
  • 5,103
  • 2
  • 11
  • 24