WHY is `rename` with selection of columns not working with a lambda function?

Question

I want to rename a selected portion of columns with a lambda function using rename

import pandas as pd


df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})

# This works but it renames all of them
# df.rename(columns=lambda x: x.replace('pre_', ''))

# I'm only wanting to edit and rename a selection
df.iloc[:, 2:5] = (df.iloc[:, 2:5]
                     .rename(columns=lambda x: x.replace('pre_', '')))

print(df)

This produces

   pre_col1  pre_col2  pre_col3  pre_col4  pre_col5
0       1.0       3.0       NaN       NaN       NaN
1       2.0       4.0       NaN       NaN       NaN

I know there are many ways to rename columns. I've read here, here, and here.

But why isn't this way working? And why does it fill the columns i'm trying to change with NaNs ??

chaining works fine for me `df.rename(columns=lambda x: x.replace("pre_","")).assign(x=1).rename(columns=lambda x: x.replace("c","cc")).rename(columns=lambda x: x.replace("cc","post_c"))` must be something in ur chain changing values — Rob Raymond, Feb 08 '21 at 20:16
@LeviBaguley I run your exact code but do not get the final output with NaN's. Have you run any other steps before this output ? — SeaBean, Feb 09 '21 at 09:55

Rafael Valero · Accepted Answer · 2021-02-09T07:57:12.370

Because of immutability of indexes. 2) Also the changes happens in a copy of the dataframe (probably because of 1)) as suggested by @SeaBean those are just in a copy.

Option 1) To change the columns names.

import pandas as pd
df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})
columns_to_modify = df.columns.tolist()[ 2:5]
columns_rename = {}
for i in columns_to_modify:
    columns_rename[i] =  i.replace('pre_', '')

df.rename(columns=columns_rename,inplace = True)
print(df)
   pre_col1  pre_col2  col3  col4  col5
0         1         3     3    94    31
1         2         4    29   170   115

Option 2) To change the columns names.

import pandas as pd
df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})
df.columns.values[2:5] = list(map(lambda x: x.replace('pre_', '') ,df.columns.tolist()[2:5]))
df
   pre_col1  pre_col2  col3  col4  col5
0         1         3     3    94    31
1         2         4    29   170   115

I believe there that the original difficulties with as with df.iloc[:, 2:5] = df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', '')) could be due to the immutability of indexes in dataframes as in:

Pandas TypeError: Index does not support mutable operations
Regarding the immutability of pandas dataframe indexes
Pandas: Change a specific column name in dataframe having multilevel columns From those appear that the dataframes indexes are immutable so they are set up all at the same time and kept like that on purpose. Interestingly the indexes appear to be immutable however you could change the values as in the second option.

Sure. a) in rename you should introduce a dictionary of columns, and I am not sure a lambda function there is valid input for the variable. Notice in the above case, is a dictionary. — Rafael Valero, Feb 08 '21 at 21:04
`columns` argument can be dict-like or function. See documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html#pandas-dataframe-rename) — Levi Baguley, Feb 08 '21 at 21:18
I think the answer is immutability of indexes in dataframes. As above. — Rafael Valero, Feb 08 '21 at 22:49

SeaBean · Answer 2 · 2021-02-09T09:51:19.123

This answer aims at giving hint to OP's question "why isn't this way working?" rather than providing alternative workable solutions where OP has already got from other posts.

The rename part makes the df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', '')) as a whole produced a copy instead of a view. As can be seen from the pandas user guide here, there is a statement stating:

The rename() method also provides an inplace named parameter that is by default False and copies the underlying data. Pass inplace=True to rename the data in place.

Also the pandas API reference for DataFrame.rename also states that:

Returns: DataFrame or None ~~~ DataFrame with the renamed axis labels or None if inplace=True.

The fact that rename (without inplace=True) returns with a copy rather than a view can be verified as follows:

df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', ''))._is_view

Output:  False

while without the rename part:

df.iloc[:, 2:5]._is_view

Output:  True

Hence, your code only renamed the copy without touching the original df.

Let's take a further example:

    data = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['apple', 'orange', 'pear'], columns=['one', 'two', 'three'])

    data.rename(index=str.title, columns=str.upper)
    Output:

          ONE   TWO THREE
    Apple   0     1     2
    Orange  3     4     5
    Pear    6     7     8


    data                            # not changed after rename()
    Output:

             one    two three
    apple      0      1     2
    orange     3      4     5
    pear       6      7     8

It is strange that in my trial run of printing the df itself after your rename code, the df is showing the original values rather than having the last 3 columns replaced with NaN. You can rerun your code to take a look.

sammywemmy · Answer 3 · 2021-02-08T23:00:12.717

0

In the lambda function, you could use the if/else concept to rename the columns:

df.rename(columns=lambda x: x.split("_")[-1]
                            if int(x[-1]) in range(3, 6) 
                            else x)

    pre_col1    pre_col2    col3    col4    col5
0         1           3       3       94    31
1         2           4       29      170   115

Sticking to your code, the if/else concept works:

df.rename(columns=lambda x: x.replace("pre_", "") 
                            if int(x[-1]) in range(3, 6) 
                            else x)

You can simply reassign to the old dataframe:

df = df.rename(columns=lambda x: x.replace("pre_", "") 
                                if int(x[-1]) in range(3, 6) 
                                else x)

edited Feb 08 '21 at 23:00

answered Feb 08 '21 at 22:26

sammywemmy

27,093
4
17
31

Notice that the new names are not in the "old"/previous dataframe df. As suggested by @SeaBean those are just in a copy. – Rafael Valero Feb 08 '21 at 22:52
you can simply reassign the outcome. – sammywemmy Feb 08 '21 at 22:59
That could be a solution : ) – Rafael Valero Feb 08 '21 at 23:00
Neat, but this only works for the toy data I provided since it assumes you will have a numerical index at the end of the column name. – Levi Baguley Feb 09 '21 at 19:21

WHY is `rename` with selection of columns not working with a lambda function?

3 Answers3