-1

While renaming the dataframe, I need to preserve the original names. For e.g.

santandar_data = pd.read_csv(r"train.csv", nrows=40000)  
santandar_data.shape  

santandar_data.original_names=santandar_data.columns

ndf=santandar_data

ndf.original_names

Index(['ID', 'var3', 'var15', 'imp_ent_var16_ult1', 'imp_op_var39_comer_ult1',
       'imp_op_var39_comer_ult3', 'imp_op_var40_comer_ult1',
       'imp_op_var40_comer_ult3', 'imp_op_var40_efect_ult1',
       'imp_op_var40_efect_ult3',
       ...
       'saldo_medio_var33_hace2', 'saldo_medio_var33_hace3',
       'saldo_medio_var33_ult1', 'saldo_medio_var33_ult3',
       'saldo_medio_var44_hace2', 'saldo_medio_var44_hace3',
       'saldo_medio_var44_ult1', 'saldo_medio_var44_ult3', 'var38', 'TARGET'],
      dtype='object', length=371)

The ndf dataframe object has a property original_names that works correctly. But when I use clean_names function, I do not get this functionality.

df=santandar_data.clean_names(case_type="upper", remove_special=True).limit_column_characters(3)
df.original_names

AttributeError: 'DataFrame' object has no attribute 'original_names'

The clean_names function comes from:

https://github.com/ericmjl/pyjanitor/blob/master/janitor/functions.py

What is the best way to change this function to include original column names as a property value?

Joe
  • 12,057
  • 5
  • 39
  • 55
shantanuo
  • 31,689
  • 78
  • 245
  • 403

1 Answers1

1

Almost certainly your pyjanitor.clean_names function returns a copy of an input dataframe. Copying a dataframe is known to not copy arbitrary attributes assigned to an instance.

But, really, these original column headings don't belong to your pd.DataFrame instance since you can't use them directly for labeling or anything else.

My advice is to store as a separate variable. If you need to group with your dataframe, you can use a dictionary along with any additional meta data:

df_dct = {'df': santandar_data, 'original_names': santandar_data.columns}

df_dct['df'] = df_dct['df'].clean_names(...)
jpp
  • 159,742
  • 34
  • 281
  • 339