-1

I am trying to rename columns in multiple dataframes and convert those columns to an integer. This is the code I have:

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))

I have a dictionary of the dataframe names and the new name of the columns:

d = {
    all_df: "all",
    coal_df: "coal",
    liquids_df: "liquids",
    coke_df: "coke",
    natural_gas_df: "natural_gas",
    nuclear_df: "nuclear",
    hydro_electricity_df: "hydro",
    wind_df: "wind",
    utility_solar_df: "utility_solar",
    geothermal_df: "geo_thermal",
    wood_biomass_df: "biomass_wood",
    biomass_other_df: "biomass_other",
    other_df: "other",
    solar_all_df: "all_solar",
}
for i, (key, value) in enumerate(d.items()):
    clean_col(key, value)

And this is the error I am getting:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Any help would be appreciated

a11
  • 3,122
  • 4
  • 27
  • 66
Shawn Jamal
  • 170
  • 8

3 Answers3

2

You are on the right track by using a dictionary to link your old and new column names. If you loop through your list of dataframes; then loop through your new column names dictionary, that will work.

df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df2 = pd.DataFrame({"A": [1, 2, 3], "D": [4, 5, 6], "F": [4, 5, 6]})
all_dfs = [df1, df2]

display(df1)
display(df2)

enter image description here

d = {
    "A": "aaaaa",
    "D": "ddddd",
}
for df in all_dfs:
    for col in d:
        if col in df.columns:
            df.rename(columns={col: d.get(col)}, inplace=True)

display(df1)
display(df2)

enter image description here

a11
  • 3,122
  • 4
  • 27
  • 66
  • @ShawnJamal great, glad it worked. I just cleaned up the loop a bit so it is a little cleaner, if that is of interest – a11 Aug 15 '21 at 03:18
  • To highlight: the trick is that we aren't really using the dictionary as a lookup structure, but simply as an *associative* structure. It doesn't matter which is the "key" and which is the "value", since we won't use one to find the other. In fact, it would arguably be better to just use multiple tuples instead. (Although it might have been better to have this dictionary structure rather than the separate DataFrame variables *in the first place*.) – Karl Knechtel Jul 05 '22 at 00:34
2

Using globals (or locals).

import pandas as pd
import io

data1 = '''id,name
1,A
2,B
3,C
4,D
'''
data2 = '''id,name
1,W
2,X
3,Y
4,Z
'''

df1 = pd.read_csv(io.StringIO(data1))
df2 = pd.read_csv(io.StringIO(data2))


def clean_function(dfname, col_name):
    df = globals()[dfname]   # also see locals()
    df.rename(columns={df.columns[0]:'NewID', df.columns[1]: col_name},inplace=True)
    return df

mydict = { 'df1': 'NewName', 'df2': 'AnotherName'}

for k,v in mydict.items():
    df = clean_function(k,v)
    print(df)

Output:

   NewID NewName
0      1       A
1      2       B
2      3       C
3      4       D
   NewID AnotherName
0      1           W
1      2           X
2      3           Y
3      4           Z
S2L
  • 1,746
  • 1
  • 16
  • 20
0

I just created two different lists and then iterated through a list of the dataframes and a list of the new column names

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))
list_df=[all_df, coal_df, liquids_df, coke_df, natural_gas_df, nuclear_df, hydro_electricity_df, wind_df, utility_solar_df, geothermal_df, wood_biomass_df, biomass_other_df, other_df, solar_all_df]                
list_col=['total', 'coal' , 'liquids' , 'coke' , 'natural_gas', 'nuclear', 'hydro','wind','utility_solar', 'geo_thermal', 'biomass_wood',   'biomass_other', 'other','all_solar']
for a,b in zip(list_df,list_col):
    clean_col(a,b)
Shawn Jamal
  • 170
  • 8