14

I have the following df

list_columns = ['A', 'B', 'C']
list_data = [
    [1, '2', 3],
    [4, '4', 5],
    [1, '2', 3],
    [4, '4', 6]
    ]
df = pd.DataFrame(columns=list_columns, data=list_data)

I want to check if multiple columns exist, and if not to create them.

Example: If B,C,D do not exist, create them(For the above df it will create only D column) I know how to do this with one column:

if 'D' not in df:
    df['D']=0

Is there a way to test if all my columns exist, and if not create the one that are missing? And not to make an if for each column

Christian
  • 459
  • 4
  • 12

2 Answers2

22

Here loop is not necessary - use DataFrame.reindex with Index.union:

cols = ['B','C','D']

df = df.reindex(df.columns.union(cols, sort=False), axis=1, fill_value=0)
print (df)
   A  B  C  D
0  1  2  3  0
1  4  4  5  0
2  1  2  3  0
3  4  4  6  0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Just to add, you can unpack the set diff between your columns and the list with an assign and ** unpacking.

import numpy as np
cols = ['B','C','D','E']

df.assign(**{col : 0 for col in np.setdiff1d(cols,df.columns.values)})

   A  B  C  D  E
0  1  2  3  0  0
1  4  4  5  0  0
2  1  2  3  0  0
3  4  4  6  0  0
Umar.H
  • 22,559
  • 7
  • 39
  • 74