120

How do I add multiple empty columns to a DataFrame from a list?

I can do:

df["B"] = None
df["C"] = None
df["D"] = None

But I can't do:

df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
P A N
  • 5,642
  • 15
  • 52
  • 103
  • `None` is different to 0, but some answers are assuming it's equivalent. Also, assigning `None` will give a dtype of object, but assigning 0 will give a dtype of int. – smci Apr 19 '20 at 11:11
  • Also you can't do `df[['B','C','D']] = None, None, None` or `[None, None, None]` or `pd.DataFrame([None, None, None])` – smci Apr 19 '20 at 11:13
  • Related : the more general [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) – smci Apr 19 '20 at 11:16

8 Answers8

121

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0
Dror
  • 12,174
  • 21
  • 90
  • 160
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 7
    After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM. – Marco Spinaci Sep 14 '17 at 15:05
  • 7
    @MarcoSpinaci: I recommend never using `inplace=True`. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So `inplace=True` is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect `df = df.reindex(...)` requires at least 2x the memory required for `df`, and of course more when `reindex` is used to expand the number of rows. – unutbu Sep 14 '17 at 16:06
  • @unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty... – toto_tico Mar 27 '18 at 12:36
  • @unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array? – Sam Oct 16 '20 at 19:13
88

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.


Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

floer32
  • 2,190
  • 4
  • 29
  • 50
EdChum
  • 376,765
  • 198
  • 813
  • 562
47

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0
toto_tico
  • 17,977
  • 9
  • 97
  • 116
11

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']
  1. for loop:

    for newcol in columns_add:
        df[newcol]= None
    
  2. dict method:

    df.assign(**dict([(_,None) for _ in columns_add]))
    
  3. tuple assignment:

    df['a'], df['b'], df['c'] = None, None, None
    
Yonas Kassa
  • 3,362
  • 1
  • 18
  • 27
9

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan
alexprice
  • 394
  • 4
  • 12
  • 0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03
7

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
3

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]
jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))
Oleg O
  • 1,005
  • 6
  • 11