4

I have a df in which I need to rename 40 column names to empty string. this can be achieved by using .rename(), but I need to provide all the column names in dict, which needs to be renamed. I was searching for some better way to rename columns by some pattern matching. wherever it finds NULL/UNNAMED in column name, replace that with empty string.

df1: original df (In actual df, i have around 20 columns as NULL1-NULL20 and 20 columns as UNNAMED1-UNNAMED20)

    NULL1   NULL2   C1  C2  UNNAMED1    UNNAMED2
0   1   11  21  31  41  51
1   2   22  22  32  42  52
2   3   33  23  33  43  53
3   4   44  24  34  44  54

desired output df:

            C1  C2      
0   1   11  21  31  41  51
1   2   22  22  32  42  52
2   3   33  23  33  43  53
3   4   44  24  34  44  54

This can be achieved by

df.rename(columns={'NULL1':'', 'NULL2':'', 'UNNAMED1':'', 'UNNAMED2':''}, inplace=True)

But I dont want to create the long dictionary of 40 elements

Sociopath
  • 13,068
  • 19
  • 47
  • 75
Nitesh Selkari
  • 67
  • 1
  • 2
  • 8

5 Answers5

4

If you want to stick with rename:

def renaming_fun(x):
    if "NULL" in x or "UNNAMED" in x:
        return "" # or None
    return x

df = df.rename(columns=renaming_fun)

It can be handy if the mapping function gets more complex. Otherwise, list comprehensions will do:

df.columns = [renaming_fun(col) for col in cols]

Another possibility:

df.columns = map(renaming_fun, df.columns)

But as it was already mentioned, renaming with empty strings is not something you would usually do.

adessy
  • 87
  • 5
1

Is it possible, but be carefull - then if need select one empty column get all empty columns, because duplicated columns names:

print (df[''])

0  1  11  41  51
1  2  22  42  52
2  3  33  43  53
3  4  44  44  54

Use startswith for get all columns by tuples in list comprehension:

df.columns = ['' if c.startswith(('NULL','UNNAMED')) else c for c in df.columns]

Your solution should be changed:

d = dict.fromkeys(df.columns[df.columns.str.startswith(('NULL','UNNAMED'))], '')
print (d)
{'NULL1': '', 'NULL2': '', 'UNNAMED1': '', 'UNNAMED2': ''}
df = df.rename(columns=d)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Is it possible to pass Django model's `verbose_name` to dataframe's column names? – Love Putin Not War Jun 07 '20 at 06:03
  • 1
    @user12379095 - Never working with Django, but if possible convert it to lists same length like number of columns then yes. `df.columns = L` – jezrael Jun 07 '20 at 06:05
  • What I have done is, create a list of column name and verbose_name from the table (as tuples) like `[('client_name', 'Client Name'), ('country', 'Country'), ('product', 'Product'), ('price', 'Price')]`. Now my dataframe containing actual data has the list of columns as `['client_name', 'country', 'price', 'product']` (i.e. actual column names). What I am attempting to do is pass the associated `verbose_name` **Client Name** in place of column name **client_name** and so on. Is this possible? – Love Putin Not War Jun 07 '20 at 13:34
  • @user12379095 - If there is `tups = [('client_name', 'Client Name'), ('country', 'Country'), ('product', 'Product'), ('price', 'Price')]` `df = pd.DataFrame(columns=['client_name', 'country', 'price', 'product'])` then solution is convert tuples to dictioanry and use `rename` like `df = df.rename(columns=dict(tups))` – jezrael Jun 08 '20 at 05:10
  • After conversion, the **dict** looks like this: `{'client_id': 'Client ID', 'client_name': 'Client Name', 'country': 'Country', 'product': 'Product', 'price': 'Price.'}`. When I look at it, I find that the model's **primary key** `'client_id'` is also part of the dict (of course, it was part of the list as well). Could this be the reason why the `.rename` is not working? If so, **how do I get the `.rename` method to ignore the extra columns**? OR, preferably, ignore the columns that **are not part of the DF Columns itself**. – Love Putin Not War Jun 08 '20 at 10:59
  • @user12379095 - For first question test if no some traling whitespaces and for second remove from dictonary values by keys (so no replacement) – jezrael Jun 08 '20 at 11:03
  • Boss, I'm trying to upload data into Django table. The source DF will not have the pri. key as one of the cols (as pk is updated by the system during record creation). There may also be certain fileds which the user may not be interested in at the time of data import (and so would not be part of the source DF). **My Problem**: How do I compare the columns of table dataframe with the source DF? My query is basically **one**. During the `.rename`, how do I change the column names by saying something like : **if field_1 of target_DF appears in source_DF, change the col name to field_1 's value**. – Love Putin Not War Jun 08 '20 at 11:42
  • 1
    Finally got it done thanks to [this](https://stackoverflow.com/questions/42056275/comparing-list-against-dict-return-key-if-value-matches-list). Actually your first **hint** was what worked in the end. Thanks for all the help. And for bearing with. – Love Putin Not War Jun 08 '20 at 13:45
1

You can use dict comprehension inside df.rename():

idx_filter = np.asarray([i for i, col in enumerate(df.columns) if SOME_STRING_CONDITION in col])
df.rename(columns={col: '' for col in df.columns[idx_filter]}, inplace=True)

In your case, it sounds like SOME_STRING_CONDITION would be 'NULL' or 'UNNAMED'.

I figured this out while looking for help on a thread for a more generic column renaming issue (Renaming columns in pandas) for a problem of my own. I didn't have enough reputation to add my solution as an answer or comment (I'm new-ish on stackoverflow), so I am posting it here!

This solution is also helpful if you need to keep part of the string that you were filtering for. For example, if you wanted to replace the "C" columns with "col_":

idx_filter = np.asarray([i for i, col in enumerate(df.columns) if 'C' in col])
df.rename(columns={col: col.replace('C', 'col_') for col in df.columns[idx_filter]}, inplace=True)
BLT
  • 415
  • 3
  • 9
0

If you have few columns to retain their name. Use list-comprehension as below:

df.columns = [col if col in ('C1','C2') else "" for col in df.columns]
Sociopath
  • 13,068
  • 19
  • 47
  • 75
0
df.columns = [col if “NULL” not in col else “” for col in df.columns]

This should work, since you can change the column names by assinging list to the dataframe column variable.