0

There are two other similar questions [1], [2]; but none helped me in my issue.

I did concat several columns of df at the beginning and at the end of df_new. The indexing has increased and I don't find a way to fix this with df.reset_index().

MWE:

import pandas as pd

print("1- df: ", len(df.index),len(df.columns),"\n")
print("1- df_new: ", len(df_new.index),len(df_new.columns),"\n")

# Adding 5 columns at the begininig and 2 columns at the end of df_new from df
df_new = pd.concat([df[df.columns[0:self.x_col_start]], df_new], axis =1)
df_new = pd.concat([df_pca, df[df.columns[self.y_col:]]], axis =1)
print("2- df: ", len(df.index),len(df.columns),"\n")
print("2- df_new: ", len(df_new.index),len(df_new.columns),"\n")

# Resetting df_new
df_pca.reset_index(level=df_pca.index.names, drop=True, inplace=True)
print("3- df: ", len(df.index),len(df.columns),"\n")
print("3- df_new: ", len(df_new.index),len(df_new.columns),"\n")

Output:

1- df: 32770, 178
1- df_new: 32770, 15

2- df: 32770, 178
2- df_new: 61441, 22

3- df: 32770, 178
3- df_new: 61441, 22

What is the issue here?

Amir
  • 1,348
  • 3
  • 21
  • 44

1 Answers1

0

After spending hours to debug my code, I found out that df hadbeen already modified in the previous blocks of code and thus, the indices needed to be reset before being used here. Lessons learned.

Always do

# drop=True for not adding the index column to the df
# inplace=True for updating df inplace without copying to another object. 
df.reset_index(drop=True, inplace=True)

before you do further manipulations, otherwise, debugging will be a nightmare at the end.

Amir
  • 1,348
  • 3
  • 21
  • 44