0

Let's consider data frame following:

import pandas as pd
df = pd.DataFrame([[1, -2, 3, -5, 4 ,2 ,7 ,-8 ,2], [2, -4, 6, 7, -8, 9, 5, 3, 2], [2, 4, 6, 7, 8, 9, 5, 3, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9]]).transpose()
df.columns = ["A", "B", "C", "D"]

   A  B  C  D
0  1  2  2  1
1 -2 -4  4  2
2  3  6  6  3
3 -5  7  7  4
4  4 -8  8  5
5  2  9  9  6
6  7  5  5  7
7 -8  3  3  8
8  2  2  2  9

I want to add at the end of the column name "pos" if column contain only positive values. What I would do with it is:

pos_idx = df.loc[:, (df>0).all()].columns
df[pos_idx].columns = df[pos_idx].columns + "pos"

However it seems not to work - it returns no error, however it does not change column names. Moreover, what is very interesting, is that code:

df.columns = df.columns + "anything"

actually add to column names word "anything". Could you please explain to me why it happens (works in general case, but it does not work on index case), and how to do this correctly?

John
  • 1,849
  • 2
  • 13
  • 23

2 Answers2

1

You are saving the new column names onto a copy of the dataframe. The below statement is not overwriting column names of df, but only of the slice df[pos_idx]

df[pos_idx].columns = df[pos_idx].columns + "pos"

Your second code example directly acccesses df, that's why that one works

How to make it work? --> Define the "full columns list" (separately). Afterwards write it into df directly.

How to define the "full list"? Add "pos" as a suffix to all cols which don't have any occurrence of values that are <=0.

my_col_list = [col+(count==0)*"_pos" for col, count in (df <= 0).sum().to_dict().items()]    
df.columns = my_col_list
KingOtto
  • 840
  • 5
  • 18
  • It's really strange for me, since for example `df[pos_idx] = np.log(df[pos_idx])` overwrites true `df` exactly in the columns pointed by `pos_idx`. – John Jul 14 '22 at 10:11
  • that's correct. but you are still assigning something back to the `df`(!) itself. more specifically, into `df` at the index positions `[pos_idx]`. Wheras, if you are using `df[pos_idx]` as a "getter", to then access its attribute `.columns`, you are *not* assigning anything to the original df any longer.. Are you aware of the issue discussed here: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – KingOtto Jul 14 '22 at 10:22
  • Can you try out `df[pos_idx]['foo'] = 5`, and look at the error message? Your action above is doing the same thing (just not adding a new column, rather you're trying to overwrite column names of a **subset** of the data) - you are not assigning to the variable `df` that you think you are assinging to, but a copy/subset of it (which gets dropped after the statement completes) – KingOtto Jul 14 '22 at 10:26
1

First of all, use .rename() function to change the name of a column.

To add 'pos' to columns with non negative values you can use this:

renamed_columns = {i:i+' pos' for i in df.columns if df[i].min()>=0}
df.rename(columns=renamed_columns,inplace=True)
  • That does not really answer the question... it's a nice workaround, but the original question is pretty much around how to *assign* column names, not how to avoid the problem by using the built-in "rename", and then using a for loop to kill the problem ;) – KingOtto Jul 14 '22 at 10:04