Add multiple empty columns to pandas DataFrame

Question

How do I add multiple empty columns to a DataFrame from a list?

I can do:

df["B"] = None
df["C"] = None
df["D"] = None

But I can't do:

df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

`None` is different to 0, but some answers are assuming it's equivalent. Also, assigning `None` will give a dtype of object, but assigning 0 will give a dtype of int. — smci, Apr 19 '20 at 11:11
Also you can't do `df[['B','C','D']] = None, None, None` or `[None, None, None]` or `pd.DataFrame([None, None, None])` — smci, Apr 19 '20 at 11:13
Related : the more general [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) — smci, Apr 19 '20 at 11:16

score 121 · Answer 1 · edited Sep 12 '17 at 13:48

121

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0

edited Sep 12 '17 at 13:48

Dror

12,174
21
90
160

answered Jun 19 '15 at 17:00

unutbu

842,883
184
1,785
1,677

7

After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM. – Marco Spinaci Sep 14 '17 at 15:05
7

@MarcoSpinaci: I recommend never using `inplace=True`. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So `inplace=True` is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect `df = df.reindex(...)` requires at least 2x the memory required for `df`, and of course more when `reindex` is used to expand the number of rows. – unutbu Sep 14 '17 at 16:06
@unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty... – toto_tico Mar 27 '18 at 12:36
@unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array? – Sam Oct 16 '20 at 19:13

score 88 · Accepted Answer · edited Dec 19 '18 at 23:08

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.

Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

toto_tico · Answer 3 · 2017-12-05T09:30:05.727

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0

score 11 · Answer 4 · answered Jul 15 '20 at 15:33

11

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']

for loop:

for newcol in columns_add:
    df[newcol]= None

dict method:

df.assign(**dict([(_,None) for _ in columns_add]))

tuple assignment:

df['a'], df['b'], df['c'] = None, None, None

answered Jul 15 '20 at 15:33

Yonas Kassa

3,362
1
18
27

1

`df.assign(**dict.fromkeys(columns_add, None))` should also work – Joe Ferndz Dec 28 '20 at 09:52

alexprice · Answer 5 · 2020-06-06T15:50:36.820

9

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan

edited Jun 06 '20 at 15:50

answered May 04 '19 at 17:04

alexprice

394
4
12

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03

Mykola Zotko · Answer 6 · 2021-09-09T07:40:59.177

7

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN

edited Sep 09 '21 at 07:40

answered Sep 09 '21 at 07:12

Mykola Zotko

15,583
3
71
73

score 3 · Answer 7 · edited Jun 22 '20 at 03:14

3

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]

edited Jun 22 '20 at 03:14

jizhihaoSAMA

12,336
9
27
49

answered Jun 22 '20 at 02:38

lumiere_profues

31
1

score 1 · Answer 8 · answered Nov 20 '19 at 12:26

1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))

answered Nov 20 '19 at 12:26

Oleg O

1,005
6
11

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03
Anyway you're missing a trailing fourth close-parenthesis. – smci Apr 19 '20 at 11:04

Add multiple empty columns to pandas DataFrame

8 Answers8

Linked

Related