13

Say I have a data frame

id col1 col2
1  1    foo
2  1    bar

And a list of column names

l = ['col3', 'col4', 'col5']

How do I add new columns to the data frame with zero as values?

id col1 col2 col3 col4 col5
1  1    foo     0    0    0
2  1    bar     0    0    0
arkisle
  • 211
  • 1
  • 3
  • 10

3 Answers3

20

You could try direct assignment (assuming your dataframe is named df):

for col in l:
    df[col] = 0

Or use the DataFrame's assign method, which is a slightly cleaner way of doing it if l can contain a value, an array or any pandas Series constructor.

# create a dictionary of column names and the value you want
d = dict.fromkeys(l, 0)
df.assign(**d)

Pandas Documentation on the assign method : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html

Thtu
  • 1,992
  • 15
  • 21
  • Your `d` can be written more simply as `dict.fromkeys(l, 0)`. (Warning, though: since we're using a dictionary we aren't guaranteed that the order will be that of `l`.) – DSM Jan 08 '16 at 01:28
  • Thanks! change added. – Thtu Jan 08 '16 at 01:29
  • how would you enforce that the dtype of the colume is int32 instead of float64? I tried `df[col] = int(0)`, and also converting the whole colum with `astype(int)` but it didn't work – jimijazz Feb 01 '18 at 21:30
  • That's strange, it should be enough to do `df[col] = df[col].astype({col: "int32"})` It would help to post a reproducible example. – Thtu Feb 24 '18 at 00:20
  • example of using a dictionary to assign to explain https://stackoverflow.com/questions/42101382/pandas-dataframe-assign-arguments – emmistar Apr 22 '21 at 21:19
2

The current accepted answer produced the following warning on my machine (using pandas=1.4.2):

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`

I got rid of these warnings by assigning new columns like so instead:

df.loc[:, l] = 0
Johan Dettmar
  • 27,968
  • 5
  • 31
  • 28
1

Actually, provided solutions with assign and df.loc are pretty slow. And PerformanceWarning appears

I would actually modify existing answer and use something like:

d = dict.fromkeys(l, 0)
temp_df = pd.DataFrame(d, index=df.index)

df = pd.concat([df, temp_df], axis=1)