36

I have an empty dataframe.

df=pd.DataFrame(columns=['a'])

for some reason I want to generate df2, another empty dataframe, with two columns 'a' and 'b'.

If I do

df.columns=df.columns+'b'

it does not work (I get the columns renamed to 'ab') and neither does the following

df.columns=df.columns.tolist()+['b']

How to add a separate column 'b' to df, and df.emtpy keep on being True?

Using .loc is also not possible

   df.loc[:,'b']=None

as it returns

  Cannot set dataframe with no defined index and a scalar
janniks
  • 2,942
  • 4
  • 23
  • 36
00__00__00
  • 4,834
  • 9
  • 41
  • 89

5 Answers5

48

Here are few ways to add an empty column to an empty dataframe:

df=pd.DataFrame(columns=['a'])
df['b'] = None
df = df.assign(c=None)
df = df.assign(d=df['a'])
df['e'] = pd.Series(index=df.index)   
df = pd.concat([df,pd.DataFrame(columns=list('f'))])
print(df)

Output:

Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []

I hope it helps.

Sumit Jha
  • 1,601
  • 11
  • 18
19

If you just do df['b'] = None then df.empty is still True and df is:

Empty DataFrame
Columns: [a, b]
Index: []

EDIT: To create an empty df2 from the columns of df and adding new columns, you can do:

df2 = pd.DataFrame(columns = df.columns.tolist() + ['b', 'c', 'd'])
Ben.T
  • 29,160
  • 6
  • 32
  • 54
8

If you want to add multiple columns at the same time you can also reindex.

new_cols = ['c', 'd', 'e', 'f', 'g']
df2 = df.reindex(df.columns.union(new_cols), axis=1)

#Empty DataFrame
#Columns: [a, c, d, e, f, g]
#Index: []
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Yeah, I like `union` better. It avoids the possibility of having two similarly named columns in the `df` – ALollz May 16 '18 at 13:52
  • @piRSquared I think maybe using concat can conbine the `reindex` and `union` – BENY May 16 '18 at 14:06
  • @Wen I'm sure you're right. However, that requires constructing a new dataframe simply to concat. I tend to avoid constructing new pandas objects if it isn't necessary. – piRSquared May 16 '18 at 14:09
6

This is one way:

df2 = df.join(pd.DataFrame(columns=['b']))

The advantage of this method is you can add an arbitrary number of columns without explicit loops.

In addition, this satisfies your requirement of df.empty evaluating to True if no data exists.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Why do you have to copy? – MrR May 21 '21 at 22:37
  • @MrR, the question states: `for some reason I want to generate df2, another empty dataframe,`. – jpp May 22 '21 at 08:11
  • `df2 = df.join(pd.DataFrame(columns=['b']))` is sufficient. No need for `df2 = df.copy()` – MrR May 22 '21 at 18:10
  • Upvoted. PS: This should be added to the first answer - it's missing from that nice compendium presented there, and it's one of the most elegant ways (if not the most elegant). – MrR May 26 '21 at 22:43
4

You can use concat:

df=pd.DataFrame(columns=['a'])
df
Out[568]: 
Empty DataFrame
Columns: [a]
Index: []

df2=pd.DataFrame(columns=['b', 'c', 'd'])
pd.concat([df,df2])
Out[571]: 
Empty DataFrame
Columns: [a, b, c, d]
Index: []
MrR
  • 411
  • 5
  • 12
BENY
  • 317,841
  • 20
  • 164
  • 234