How to add an empty column to a dataframe?

Question

What's the easiest way to add an empty column to a pandas DataFrame object? The best I've stumbled upon is something like

df['foo'] = df.apply(lambda _: '', axis=1)

Is there a less perverse method?

Do you actually want a column containing empty strings or rather `N/A`? — filmor, May 01 '13 at 21:50

score 731 · Accepted Answer · edited Feb 16 '19 at 21:44

731

If I understand correctly, assignment should fill:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

edited Feb 16 '19 at 21:44

Jinhua Wang

1,679
1
17
44

answered May 01 '13 at 21:52

DSM

342,061
65
592
494

5

This answer just created new rows for me. – logicbloke May 16 '19 at 15:26
@logicbloke can you provide an example where this is happening? – craymichael Jun 13 '19 at 01:58
@craymichael It's been a while but I believe I had number-indexed columns with no names and named rows and it just created a new row at the end. – logicbloke Jun 13 '19 at 06:54
4

If the `df` is empty, you may want to use `df['new'] = pd.Series()` (see my answer below) – Carsten Jul 31 '19 at 15:00
5

how to add multiple empty columns? – M. Mariscal Feb 26 '20 at 10:24
@logicbloke, i think if you have a series object then the answer provided creates new rows. but using .to_frame() to convert the series object to a pandas dataframe allows to use the method to insert new columns – matthew May 11 '20 at 07:38
similar attitude provided me with a warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Gideon Kogan Nov 18 '20 at 12:25
12

@M.Mariscal `df[["newcol1","newcol2","newcol3"]] = None`. – Skippy le Grand Gourou Jun 04 '21 at 07:17
@SkippyleGrandGourou - the `df[["newcol1","newcol2","newcol3"]] = None` causes KeyError: "['newcol1' 'newcol2' 'newcol3'] not in index" in pandas 0.19.2. – sdbbs Mar 16 '22 at 00:04
@sdbbs Pandas 0.19.2 is a fairly old version, I wouldn’t be surprised if it didn’t support this syntax. – Skippy le Grand Gourou Mar 16 '22 at 09:37
1

@skippy-le-grand-gourou, this code will trigger a SettingWithCopyWarning warning. Do this instead: `df.loc[:, ["newcol1","newcol2","newcol3"]] = np.nan` – think Jul 13 '23 at 11:46

emunsing · Answer 2 · 2019-11-26T21:35:32.660

80

To add to DSM's answer and building on this associated question, I'd split the approach into two cases:

Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan
Adding multiple columns: I'd suggest using the .reindex(columns=[...]) method of pandas to add the new columns to the dataframe's column index. This also works for adding multiple new rows with .reindex(rows=[...]). Note that newer versions of Pandas (v>0.20) allow you to specify an axis keyword rather than explicitly assigning to columns or rows.

Here is an example adding multiple columns:

mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])

or

mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0

You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn't feel as pythonic to me :)

edited Nov 26 '19 at 21:35

answered Sep 09 '16 at 06:56

emunsing

9,536
3
23
29

3

Example for `version >= 0.20.0` deletes the DataFrame and adds the new columns as rows. Example for `version < 0.20.0` works fine on Pandas Version `0.24.1` – Lalo Mar 11 '19 at 14:20
@emunsing While searching for an answer to this question, I ultimately found your answer helpful. At first, however, it wasn't working for me as Pandas requires `, axis=1` in `version = 0.25`. I attempted to modify your answer to include the updated version, but I was rejected by @kenlukas and @il_raffa. I hope everyone struggling to understand why your response isn't working for them--like I was--at least comes across this comment. – smgeneralist Nov 24 '19 at 14:15
@Griff - I've now updated my answer to be more accurate and explicit about version compatability issues. Thanks for highlighting this. – emunsing Nov 26 '19 at 21:36

Carsten · Answer 3 · 2021-12-22T21:47:29.210

75

I like:

df['new'] = pd.Series(dtype='int')

# or use other dtypes like 'float', 'object', ...

If you have an empty dataframe, this solution makes sure that no new row containing only NaN is added.

Specifying dtype is not strictly necessary, however newer Pandas versions produce a DeprecationWarning if not specified.

edited Dec 22 '21 at 21:47

answered Jul 31 '19 at 14:59

Carsten

2,765
1
13
28

3

This is the best way to insert a new column with predefined dtype. – normanius Apr 12 '21 at 18:43
1

Totally agree. If for any reason you need to size the new series to any given `df`, you can add `index=df.index`. – Wtower Jun 19 '22 at 10:58

score 63 · Answer 4 · edited May 16 '17 at 08:29

63

an even simpler solution is:

df = df.reindex(columns = header_list)

where "header_list" is a list of the headers you want to appear.

any header included in the list that is not found already in the dataframe will be added with blank cells below.

so if

header_list = ['a','b','c', 'd']

then c and d will be added as columns with blank cells

edited May 16 '17 at 08:29

maazza

7,016
15
63
96

answered May 16 '17 at 08:08

liana

631
5
2

6

More precisely, the columns will be added with NaNs. – broccoli2000 Aug 01 '17 at 14:18

score 44 · Answer 5 · edited Jun 28 '22 at 08:05

Starting with v0.16.0, DF.assign() could be used to assign new columns (single/multiple) to a DF. These columns get inserted in alphabetical order at the end of the DF.

This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.

Consider the same DF sample demonstrated by @DSM:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
   A  B
0  1  2
1  2  3
2  3  4

df.assign(C="",D=np.nan)
Out[21]:
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF to be modified accordingly, use it like : df = df.assign(...) as it does not support inplace operation currently.

What is that datatype for C? I am trying to add by looping through a list of strings. But it does not use it. — eleijonmarck, Oct 24 '17 at 11:04

Ankush Rathour · Answer 6 · 2023-04-06T13:27:13.450

11

df["C"] = ""
df["D"] = np.nan

Assignment will give you this warning SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

so its better to use insert:

df.insert(index, column-name, column-value)

If this answer helps you don't forget to upvote

edited Apr 06 '23 at 13:27

answered Jun 17 '22 at 07:51

Ankush Rathour

348
3
12

score 7 · Answer 7 · edited Jul 13 '19 at 16:41

7

if you want to add column name from a list

df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
    df[i]=np.nan

edited Jul 13 '19 at 16:41

Varun Gupta

33
6

answered Mar 22 '18 at 04:30

Joy Mazumder

870
1
8
14

score 6 · Answer 8 · edited May 23 '17 at 12:34

6

@emunsing's answer is really cool for adding multiple columns, but I couldn't get it to work for me in python 2.7. Instead, I found this works:

mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])

edited May 23 '17 at 12:34

Community

1
1

answered Apr 17 '17 at 13:23

edge-case

1,128
2
14
32

1

Please don't use Python 2.7... – Michael Currie Jul 20 '22 at 07:27

score 6 · Answer 9 · answered Apr 09 '20 at 10:30

6

One can use df.insert(index_to_insert_at, column_header, init_value) to insert new column at a specific index.

cost_tbl.insert(1, "col_name", "")

The above statement would insert an empty Column after the first column.

answered Apr 09 '20 at 10:30

Usman Ahmad

376
4
13

score 4 · Answer 10 · answered Jun 10 '21 at 06:26

4

this will also work for multiple columns:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4

df1 = pd.DataFrame(columns=['C','D','E'])
df = df.join(df1, how="outer")

>>>df
    A   B   C   D   E
0   1   2   NaN NaN NaN
1   2   3   NaN NaN NaN
2   3   4   NaN NaN NaN

Then do whatever you want to do with the columns pd.Series.fillna(),pd.Series.map() etc.

answered Jun 10 '21 at 06:26

Talis

283
3
13

how efficient is that? – Leonardo Cló Jul 28 '21 at 21:31
https://stackoverflow.com/questions/51715082/what-is-the-running-time-big-o-order-of-pandas-dataframe-join if you join on actual data it's O(n log(n)) , my assumption is since the df is empty, max O(n) – Talis Jul 29 '21 at 08:01

score 3 · Answer 11 · answered Dec 20 '19 at 10:06

3

You can do

df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe

answered Dec 20 '19 at 10:06

Bharath_Raja

622
8
16

score 3 · Answer 12 · edited Jun 20 '23 at 23:50

If you have a list of columns that you want to be empty, you can use assign, then comprehension dict, then dict unpacking.

>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> nan_cols_name = ["C","D","whatever"]
>>> df.assign(**{col:np.nan for col in nan_cols_name})

   A  B   C   D  whatever
0  1  2 NaN NaN       NaN
1  2  3 NaN NaN       NaN
2  3  4 NaN NaN       NaN

You can also unpack multiple dict in a dict that you unpack if you want different values for different columns.

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
nan_cols_name = ["C","D","whatever"]
empty_string_cols_name = ["E","F","bad column with space"]
df = df.assign(**{
    **{col:np.nan for col in my_empy_columns_name}, 
    **{col:"" for col in empty_string_cols_name}
            }
         )

score 2 · Answer 13 · answered Sep 12 '19 at 11:48

The below code address the question "How do I add n number of empty columns to my existing dataframe". In the interest of keeping solutions to similar problems in one place, I am adding it here.

Approach 1 (to create 64 additional columns with column names from 1-64)

m = list(range(1,65,1)) 
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists

Approach 2 (to create 64 additional columns with column names from 1-64)

df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')

score 0 · Answer 14 · edited Aug 15 '20 at 09:49

Sorry for I did not explain my answer really well at beginning. There is another way to add an new column to an existing dataframe. 1st step, make a new empty data frame (with all the columns in your data frame, plus a new or few columns you want to add) called df_temp 2nd step, combine the df_temp and your data frame.

df_temp = pd.DataFrame(columns=(df_null.columns.tolist() + ['empty']))
df = pd.concat([df_temp, df])

It might be the best solution, but it is another way to think about this question.

the reason of I am using this method is because I am get this warning all the time:

: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["empty1"], df["empty2"] = [np.nan, ""]

great I found the way to disable the Warning

pd.options.mode.chained_assignment = None

Ok so... make sure that when giving an answer please give some info on what is happening line by line of possible. Because the person asking the question won't learn from this will he? He will copy and paste and his code will work and he won't know why. So I suggest adding a bit more info. — , Aug 09 '20 at 03:30

score 0 · Answer 15 · answered Feb 16 '21 at 19:08

The reason I was looking for such a solution is simply to add spaces between multiple DFs which have been joined column-wise using the pd.concat function and then written to excel using xlsxwriter.

df[' ']=df.apply(lambda _: '', axis=1)
df_2 = pd.concat([df,df1],axis=1)                #worked but only once. 
# Note: df & df1 have the same rows which is my index. 
#
df_2[' ']=df_2.apply(lambda _: '', axis=1)       #didn't work this time !!?     
df_4 = pd.concat([df_2,df_3],axis=1)

I then replaced the second lambda call with

df_2['']=''                                 #which appears to add a blank column
df_4 = pd.concat([df_2,df_3],axis=1)

The output I tested it on was using xlsxwriter to excel. Jupyter blank columns look the same as in excel although doesnt have xlsx formatting. Not sure why the second Lambda call didnt work.

How to add an empty column to a dataframe?

15 Answers15

Linked

Related