Concatenating two DFs with same rows number creates a new one with different rows number

Question

For example: An example for what I'm expecting to get

In my real data, despite the fact that I connect two DFs with same row numbers, the new DF has more rows than the two I'm connecting.

df_numeric = df.iloc[:,0:10]
numeric_cols = df_numeric.columns.tolist()
df_categorial = df.iloc[:,10:]
from sklearn.preprocessing import Normalizer
transformer = Normalizer().fit(df_numeric)  # fit does nothing.
df_numeric = transformer.transform(df_numeric)
df_numeric = pd.DataFrame(df_numeric)
df_numeric.columns = numeric_cols 
df= pd.concat([df_numeric , df_categorial] , axis = 1  )

I get: my real DF after the concat

I tried What Vincent said :

df_numeric.reset_index(inplace=True, drop=True) 
df_categorial.reset_index(inplace=True, drop=True) 
df = pd.concat([df_numeric , df_categorial] , axis = 1  )

I think now it's working. I don't get why at the stat it made problem - before i rested the indexes they were the same in both DF

pd.concat with axis = 1 will concatenate on the same index. If your rows are ordered the way you want, then reset_index() before concat — Vincent, Dec 27 '19 at 15:12
Nothing. reset_index will just reset index. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html — Vincent, Dec 27 '19 at 15:16
Please post sample data of `df` and `df_categorical` for [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Parfait, Dec 27 '19 at 16:35

score 1 · Answer 1 · answered Dec 27 '19 at 15:24

You can use merge to do this. Here is an example:

import pandas as pd

df_numeric = pd.DataFrame(
    {
        'index' : [1,2,3],
        'age': [13,60,30],
        'weight': [50, 80, 70]
    }
)

df_categorical = pd.DataFrame(
    {
        'index' : [1,2,3],
        'has_car': [1,1,1],
        'has_pet': [1, 0, 0],
        'has_brother': [1, 1, 0]
    }
)

df = df_numeric.merge(df_categorical, on='index')
print(df)

Concatenating two DFs with same rows number creates a new one with different rows number

1 Answers1