Preserve original column names

Question

While renaming the dataframe, I need to preserve the original names. For e.g.

santandar_data = pd.read_csv(r"train.csv", nrows=40000)  
santandar_data.shape  

santandar_data.original_names=santandar_data.columns

ndf=santandar_data

ndf.original_names

Index(['ID', 'var3', 'var15', 'imp_ent_var16_ult1', 'imp_op_var39_comer_ult1',
       'imp_op_var39_comer_ult3', 'imp_op_var40_comer_ult1',
       'imp_op_var40_comer_ult3', 'imp_op_var40_efect_ult1',
       'imp_op_var40_efect_ult3',
       ...
       'saldo_medio_var33_hace2', 'saldo_medio_var33_hace3',
       'saldo_medio_var33_ult1', 'saldo_medio_var33_ult3',
       'saldo_medio_var44_hace2', 'saldo_medio_var44_hace3',
       'saldo_medio_var44_ult1', 'saldo_medio_var44_ult3', 'var38', 'TARGET'],
      dtype='object', length=371)

The ndf dataframe object has a property original_names that works correctly. But when I use clean_names function, I do not get this functionality.

df=santandar_data.clean_names(case_type="upper", remove_special=True).limit_column_characters(3)
df.original_names

AttributeError: 'DataFrame' object has no attribute 'original_names'

The clean_names function comes from:

https://github.com/ericmjl/pyjanitor/blob/master/janitor/functions.py

What is the best way to change this function to include original column names as a property value?

`clean_names` likely returns a *copy* of your dataframe. I believe in certain versions of Pandas attributes are not guaranteed to be copied across. — jpp, Nov 22 '18 at 09:42

score 1 · Accepted Answer · answered Nov 22 '18 at 10:17

Almost certainly your pyjanitor.clean_names function returns a copy of an input dataframe. Copying a dataframe is known to not copy arbitrary attributes assigned to an instance.

But, really, these original column headings don't belong to your pd.DataFrame instance since you can't use them directly for labeling or anything else.

My advice is to store as a separate variable. If you need to group with your dataframe, you can use a dictionary along with any additional meta data:

df_dct = {'df': santandar_data, 'original_names': santandar_data.columns}

df_dct['df'] = df_dct['df'].clean_names(...)

Preserve original column names

1 Answers1