0

I want to rename a selected portion of columns with a lambda function using rename

import pandas as pd


df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})

# This works but it renames all of them
# df.rename(columns=lambda x: x.replace('pre_', ''))

# I'm only wanting to edit and rename a selection
df.iloc[:, 2:5] = (df.iloc[:, 2:5]
                     .rename(columns=lambda x: x.replace('pre_', '')))

print(df)

This produces

   pre_col1  pre_col2  pre_col3  pre_col4  pre_col5
0       1.0       3.0       NaN       NaN       NaN
1       2.0       4.0       NaN       NaN       NaN

I know there are many ways to rename columns. I've read here, here, and here.

But why isn't this way working? And why does it fill the columns i'm trying to change with NaNs ??

Levi Baguley
  • 646
  • 1
  • 11
  • 18
  • chaining works fine for me `df.rename(columns=lambda x: x.replace("pre_","")).assign(x=1).rename(columns=lambda x: x.replace("c","cc")).rename(columns=lambda x: x.replace("cc","post_c"))` must be something in ur chain changing values – Rob Raymond Feb 08 '21 at 20:16
  • Did you run the the exact code in the question? – Levi Baguley Feb 08 '21 at 20:29
  • @LeviBaguley I run your exact code but do not get the final output with NaN's. Have you run any other steps before this output ? – SeaBean Feb 09 '21 at 09:55

3 Answers3

1
  1. Because of immutability of indexes. 2) Also the changes happens in a copy of the dataframe (probably because of 1)) as suggested by @SeaBean those are just in a copy.

Option 1) To change the columns names.

import pandas as pd
df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})
columns_to_modify = df.columns.tolist()[ 2:5]
columns_rename = {}
for i in columns_to_modify:
    columns_rename[i] =  i.replace('pre_', '')

df.rename(columns=columns_rename,inplace = True)
print(df)
   pre_col1  pre_col2  col3  col4  col5
0         1         3     3    94    31
1         2         4    29   170   115

Option 2) To change the columns names.

import pandas as pd
df = pd.DataFrame({'pre_col1': [1, 2],
                   'pre_col2': [3, 4],
                   'pre_col3': [ 3,  29],
                   'pre_col4': [94, 170],
                   'pre_col5': [31, 115]})
df.columns.values[2:5] = list(map(lambda x: x.replace('pre_', '') ,df.columns.tolist()[2:5]))
df
   pre_col1  pre_col2  col3  col4  col5
0         1         3     3    94    31
1         2         4    29   170   115

I believe there that the original difficulties with as with df.iloc[:, 2:5] = df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', '')) could be due to the immutability of indexes in dataframes as in:

  1. Pandas TypeError: Index does not support mutable operations
  2. Regarding the immutability of pandas dataframe indexes
  3. Pandas: Change a specific column name in dataframe having multilevel columns From those appear that the dataframes indexes are immutable so they are set up all at the same time and kept like that on purpose. Interestingly the indexes appear to be immutable however you could change the values as in the second option.
Rafael Valero
  • 2,736
  • 18
  • 28
  • Sure. a) in rename you should introduce a dictionary of columns, and I am not sure a lambda function there is valid input for the variable. Notice in the above case, is a dictionary. – Rafael Valero Feb 08 '21 at 21:04
  • b) Also I do not see clear the () of the ... = (df.iloc ... – Rafael Valero Feb 08 '21 at 21:06
  • `columns` argument can be dict-like or function. See documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html#pandas-dataframe-rename) – Levi Baguley Feb 08 '21 at 21:18
  • I think the answer is immutability of indexes in dataframes. As above. – Rafael Valero Feb 08 '21 at 22:49
1

This answer aims at giving hint to OP's question "why isn't this way working?" rather than providing alternative workable solutions where OP has already got from other posts.

The rename part makes the df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', '')) as a whole produced a copy instead of a view. As can be seen from the pandas user guide here, there is a statement stating:

The rename() method also provides an inplace named parameter that is by default False and copies the underlying data. Pass inplace=True to rename the data in place.

Also the pandas API reference for DataFrame.rename also states that:

Returns: DataFrame or None ~~~ DataFrame with the renamed axis labels or None if inplace=True.

The fact that rename (without inplace=True) returns with a copy rather than a view can be verified as follows:

df.iloc[:, 2:5].rename(columns=lambda x: x.replace('pre_', ''))._is_view

Output:  False

while without the rename part:

df.iloc[:, 2:5]._is_view

Output:  True

Hence, your code only renamed the copy without touching the original df.

Let's take a further example:

    data = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['apple', 'orange', 'pear'], columns=['one', 'two', 'three'])

    data.rename(index=str.title, columns=str.upper)
    Output:

          ONE   TWO THREE
    Apple   0     1     2
    Orange  3     4     5
    Pear    6     7     8


    data                            # not changed after rename()
    Output:

             one    two three
    apple      0      1     2
    orange     3      4     5
    pear       6      7     8

It is strange that in my trial run of printing the df itself after your rename code, the df is showing the original values rather than having the last 3 columns replaced with NaN. You can rerun your code to take a look.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
0

In the lambda function, you could use the if/else concept to rename the columns:

df.rename(columns=lambda x: x.split("_")[-1]
                            if int(x[-1]) in range(3, 6) 
                            else x)

    pre_col1    pre_col2    col3    col4    col5
0         1           3       3       94    31
1         2           4       29      170   115

Sticking to your code, the if/else concept works:

df.rename(columns=lambda x: x.replace("pre_", "") 
                            if int(x[-1]) in range(3, 6) 
                            else x)

You can simply reassign to the old dataframe:

df = df.rename(columns=lambda x: x.replace("pre_", "") 
                                if int(x[-1]) in range(3, 6) 
                                else x)
sammywemmy
  • 27,093
  • 4
  • 17
  • 31