1

I need to rename columns in pandas so that several different column names associated to a pre-defined keyword have their titles replaced by that key word.

I would like that a list of several different potential column names are associated to one key word, which I can then use to later group information. It is not like this problem in Renaming columns in pandas as this does not address the use of multiple column names that can be associated to one keyword.

For instance: cats, dogs, birds, fish -> are replaced with the title "animals"

I was looking at the rename function HERE and HERE , however, it does not seem to take into account the possibility to have multiple columns be associated to a key word to rename.

Is this posisble to do within pandas?

My (not-working) attempt so far is as follows:

newNames = {
    'animals':['cats','dogs','fish'],
    'colors':['red','blue','green']
}

sample df:

cats    dogs    fish    red
1   2   3   2
2   3   5   4
3   4   3   4

df.rename(index=str,columns=newNames,inplace=True)

desired result:

animals     animals     animals     colors
1   2   3   2
2   3   5   4
3   4   3   4
owwoow14
  • 1,694
  • 8
  • 28
  • 43
  • Duplicate columns are generally a very bad idea. It may make your results unusable. Are you sure this is what you want? – jpp May 29 '18 at 14:36
  • its not a duplicate @DenisRasulev as that question does not address the problem I am facing. – owwoow14 May 29 '18 at 14:36
  • @jpp yes because I later want to group information from the headers – owwoow14 May 29 '18 at 14:37
  • @jpp Because this is a sample data, there are thousands of different csvs that I will go through and i want to normalize the column names. – owwoow14 May 29 '18 at 14:39
  • Sure, `pandas` can deal with thousands of dataframes too. Naming columns the same, like this, almost certainly isn't the best way to approach your problem. – jpp May 29 '18 at 14:41

3 Answers3

2

Create your dict using melt

df.rename(columns=pd.DataFrame(newNames).melt().set_index('value').variable.to_dict())
Out[275]: 
   animals  animals  animals  colors
0        1        2        3       2
1        2        3        5       4
2        3        4        3       4
BENY
  • 317,841
  • 20
  • 164
  • 234
2

IIUC, you could - as an alternative to using the same name for many columns (which might be a bad idea) - consider using MultiIndexing

For example:

categories = {"animals": ["cats", "dogs", "fish"],
              "colors" : ["red"]}

df.columns = pd.MultiIndex.from_tuples([(k, sub) for k,v in categories.items() for sub in v])

Then your output would be something like:

        animals                 colors

        cats    dogs    fish    red
0       1       2       3       2
1       2       3       5       4
2       3       4       3       4
rafaelc
  • 57,686
  • 15
  • 58
  • 82
1

Will this work for you?

import pandas as pd
df = pd.DataFrame({"cats": [1, 2, 3], "dogs": [4, 5, 6], "fish": [7, 8, 9], "red": [10, 11, 12],})

# df
   cats  dogs  fish  red
0     1     4     7   10
1     2     5     8   11
2     3     6     9   12

new_names = {
    "cats": "animals",
    "dogs": "animals",
    "fish": "animals",
    "red": "colors"
    }

new_df = df.rename(index=str, columns=new_names)

# new_df
   animals  animals  animals  colors
0        1        4        7      10
1        2        5        8      11
2        3        6        9      12

If column name is not listed in new_names dictionary, then it just stays the same. Dimension of a data frame doesn't matter in this case.

Example:

df2 = pd.DataFrame({"cats": [1, 2, 3], "digs": [4, 5, 6], "fish": [7, 8, 9], "worm": [10, 11, 12], "blue": [10, 11, 12]})

# df2
   cats  digs  fish  worm  blue
0     1     4     7    10    10
1     2     5     8    11    11
2     3     6     9    12    12

new_df2 = df2.rename(index=str, columns=new_names)

# new_df2
   animals  digs  animals  worm  blue
0        1     4        7    10    10
1        2     5        8    11    11
2        3     6        9    12    12
Denis Rasulev
  • 3,744
  • 4
  • 33
  • 47
  • this does work, but one thing to mention is that not all of the dfs will contain the same columns. Meaning that some might not have to be renamed as "colors" because they do not contain any key words associated to that. In this case, this solution throws a key error due to the array shape. is there a way to over cme that? – owwoow14 May 29 '18 at 15:08
  • Well, this is strange. I just tried various data frames with different number of columns with different names, existing in `new_names` dictionary and not. Didn't get any error. Value not listed in `new_names` dictionary just stay the same. Can you copy-paste error message? – Denis Rasulev May 29 '18 at 15:18
  • of course: ValueError: arrays must all be same length – owwoow14 May 29 '18 at 15:28
  • There must be something wrong here. You are not working with array vs another array. You just replacing column names according to the dictionary. If there is not dictionary key for a column name then it remains unchanged. In this case length of the array does not matter. – Denis Rasulev May 29 '18 at 15:46