1

Say I have a dataframe like below:

df = pd.DataFrame({0:['Hello World!']}) # here df could have more than one column of data as shown below
df = pd.DataFrame({0:['Hello World!'], 1:['Hello Mars!']}) # or df could have more than one row of data as shown below
df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})

and I also have a list of column names like below:

new_col_names = ['a','b','c','d'] # here, len(new_col_names) might vary like below
new_col_names = ['a','b','c','d','e'] # but we can always be sure that the len(new_col_names) >= len(df.columns)

Given that, how could I replace the column names in df such that it results something like below:

df = pd.DataFrame({0:['Hello World!']})
new_col_names = ['a','b','c','d']
# result would be like this
a               b               c               d
Hello World!    (empty string)  (empty string)  (empty string)


df = pd.DataFrame({0:['Hello World!'], 1:['Hello Mars!']}) 
new_col_names = ['a','b','c','d']
# result would be like this
a               b               c               d
Hello World!    Hello Mars!     (empty string)  (empty string)


df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})
new_col_names = ['a','b','c','d','e']
a               b               c               d               e
Hello World!    (empty string)  (empty string)  (empty string)  (empty string)
Hellow Mars!    (empty string)  (empty string)  (empty string)  (empty string)

From reading around StackOverflow answers such as this, I have a vague idea that it could be something like below:

df[new_col_names] = '' # but this returns KeyError
# or this
df.columns=new_col_names # but this returns ValueError: Length mismatch (of course)

If someone could show me, a way to overwrite existing dataframe column name and at the same time add new data columns with empty string values in the rows, I'd greatly appreciate the help.

user1330974
  • 2,500
  • 5
  • 32
  • 60

3 Answers3

3

Idea is create dictionary by existing columns names by zip, rename only existing columns and then add all new one by DataFrame.reindex:

df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})
new_col_names = ['a','b','c','d','e']

df1 = (df.rename(columns=dict(zip(df.columns, new_col_names)))
        .reindex(new_col_names, axis=1, fill_value=''))
print (df1)
              a b c d e
0  Hello World!        
1   Hello Mars!      


df1 = (df.rename(columns=dict(zip(df.columns, new_col_names)))
         .reindex(new_col_names, axis=1))
print (df1)
              a   b   c   d   e
0  Hello World! NaN NaN NaN NaN
1   Hello Mars! NaN NaN NaN NaN  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you and your solution works! I've accepted yours since it's posted earlier than others below and also because it is a one-liner and neat. :) – user1330974 May 07 '20 at 15:42
2

Here is a function that will do what you want

import pandas as pd

# function
def rename_add_col(df: pd.DataFrame, cols: list) -> pd.DataFrame:
    c_len = len(df.columns)
    if c_len == len(cols):
        df.columns = cols
    else:
        df.columns = cols[:c_len]
        df = pd.concat([df, pd.DataFrame(columns=cols[c_len:])]) 
    return df

# create dataframe
t1 = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', '5', '6'], 'c': ['7', '8', '9']})

    a   b   c
0   1   4   7
1   2   5   8
2   3   6   9

# call function
cols = ['d', 'e', 'f']
t1 = rename_add_col(t1, cols)

    d   e   f
0   1   4   7
1   2   5   8
2   3   6   9

# call function
cols = ['g', 'h', 'i', 'new1', 'new2']
t1 = rename_add_col(t1, cols)


    g   h   i   new1    new2
0   1   4   7    NaN     NaN
1   2   5   8    NaN     NaN
2   3   6   9    NaN     NaN
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Thank you! Your solution also works, but I've accepted @jezrael's answer because his was posted slightly earlier and that's what I ended up using. Thanks again! :) – user1330974 May 07 '20 at 15:43
  • @user1330974 That's fine, his is the more succinct answer and should be accepted. However, the greater the time, reflects which answer was submitted first. Pranjal was the first submittal, mine was second and jevrael submitted last. But, that shouldn't have a bearing. Glad we could solve your issue. – Trenton McKinney May 07 '20 at 21:48
  • 1
    Thank you for understanding and providing a working solution. I didn't realize StackOverflow sort the answers from newest to oldest. I have been accepting answers with the wrong assumption for all these years. But thanks to your comment above, this ends today. :) – user1330974 May 08 '20 at 22:22
  • @user1330974 You should always just accept the answer that is best for you. You can accept an answer and then go back and accept a new answer if a better one has been provided. – Trenton McKinney May 08 '20 at 22:26
1

This might help you do it all at once

Use your old Dataframe to recreate another dataframe with the pd.DataFrame() method and then add new columns in the columns paramater by list addition.

Note : This would add new columns as per index length, but with NaN values, workaround for which would be doing a df.fillna(' ')

pd.DataFrame(df.to_dict() , columns = list(df.columns)+['b','c'])

Hope this Helps! Cheers !