How to add multiple columns to pandas dataframe in one assignment

Question

I'm trying to figure out how to add multiple columns to pandas simultaneously with Pandas. I would like to do this in one step rather than multiple repeated steps.

import pandas as pd

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

I thought this would work here...

df[['column_new_1', 'column_new_2', 'column_new_3']] = [np.nan, 'dogs', 3]

Appending columns to an empty dataframe it is the simplest solution I found after getting errors from solutions provided below and from other threads: https://onecompiler.com/python/3zhtqjrut Source: https://www.geeksforgeeks.org/how-to-create-an-empty-dataframe-and-append-rows-columns-to-it-in-pandas/?ref=gcse — Lod, Aug 17 '23 at 17:15

score 384 · Accepted Answer · edited Apr 10 '23 at 18:42

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using list unpacking:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) `DataFrame` conveniently expands a single row to match the index, so you can do this:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary DataFrame with new columns, then combine to the original DataFrame with `.concat`:

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4) Similar to 3, but using `join` instead of `concat` (may be less efficient):

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a `dict` is a more "natural" way to create the new DataFrame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6) Use `.assign()` with multiple column arguments.

I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on this answer), but I don't know when it would be worth the trouble:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8) In the end, it's hard to beat three separate assignments:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other questions:

score 56 · Answer 2 · answered Oct 04 '17 at 20:02

You could use assign with a dict of column names and values.

In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN

score 28 · Answer 3 · answered Jun 13 '22 at 15:20

My goal when writing Pandas is to write efficient readable code that I can chain. I won't go into why I like chaining so much here, I expound on that in my book, Effective Pandas.

I often want to add new columns in a succinct manner that also allows me to chain. My general rule is that I update or create columns using the .assign method.

To answer your question, I would use the following code:

(df
 .assign(column_new_1=np.nan,
         column_new_2='dogs',
         column_new_3=3
        )
)

To go a little further. I often have a dataframe that has new columns that I want to add to my dataframe. Let's assume it looks like say... a dataframe with the three columns you want:

df2 = pd.DataFrame({'column_new_1': np.nan,
                    'column_new_2': 'dogs',
                    'column_new_3': 3},
                   index=df.index
                  )

In this case I would write the following code:

(df
 .assign(**df2)
)

I love ```df.assign(**df2)``` – trvjbr Aug 17 '23 at 19:34 — trvjbr, Aug 17 '23 at 19:34

score 16 · Answer 4 · answered Aug 20 '16 at 05:00

With the use of concat:

In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN

Not very sure of what you wanted to do with [np.nan, 'dogs',3]. Maybe now set them as default values?

In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]

In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

bradylange · Answer 5 · 2021-03-17T19:57:49.713

Dictionary mapping with .assign():

This is the most readable and dynamic way to assign new column(s) with value(s) when working with many of them.

import pandas as pd
import numpy as np

new_cols = ["column_new_1", "column_new_2", "column_new_3"]
new_vals = [np.nan, "dogs", 3]
# Map new columns as keys and new values as values
col_val_mapping = dict(zip(new_cols, new_vals))
# Unpack new column/new value pairs and assign them to the data frame
df = df.assign(**col_val_mapping)

If you're just trying to initialize the new column values to be empty as you either don't know what the values are going to be or you have many new columns.

import pandas as pd
import numpy as np

new_cols = ["column_new_1", "column_new_2", "column_new_3"]
new_vals = [None for item in new_cols]
# Map new columns as keys and new values as values
col_val_mapping = dict(zip(new_cols, new_vals))
# Unpack new column/new value pairs and assign them to the data frame
df = df.assign(**col_val_mapping)

piRSquared · Answer 6 · 2016-08-20T15:09:23.787

3

use of list comprehension, pd.DataFrame and pd.concat

pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
        )
    ], axis=1)

edited Aug 20 '16 at 15:09

answered Aug 20 '16 at 06:49

piRSquared

285,575
57
475
624

score 3 · Answer 7 · answered May 02 '19 at 14:15

if adding a lot of missing columns (a, b, c ,....) with the same value, here 0, i did this:

    new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)

It's based on the second variant of the accepted answer.

score 3 · Answer 8 · answered Oct 07 '22 at 06:49

3

You can use tuple unpacking:

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

df['col3'], df['col4'] = 'a', 10

Result:

   col1  col2 col3  col4
0     1     3    a    10
1     2     4    a    10

answered Oct 07 '22 at 06:49

Mykola Zotko

15,583
3
71
73

score 2 · Answer 9 · edited Jun 20 '20 at 09:12

Just want to point out that option2 in @Matthias Fripp's answer

(2) I wouldn't necessarily expect DataFrame to work this way, but it does

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

is already documented in pandas' own documentation http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner. You may find this useful for applying a transform (in-place) to a subset of the columns.

score 0 · Answer 10 · answered Jul 23 '19 at 11:23

If you just want to add empty new columns, reindex will do the job

df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7

df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN

full code example

import numpy as np
import pandas as pd

df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')

otherwise go for zeros answer with assign

score 0 · Answer 11 · edited May 12 '20 at 12:25

0

I am not comfortable using "Index" and so on...could come up as below

df.columns
Index(['A123', 'B123'], dtype='object')

df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])

df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)


df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')

edited May 12 '20 at 12:25

Nensi Kasundra

1,980
6
21
34

answered May 12 '20 at 09:57

Alex

1
1

score 0 · Answer 12 · answered Sep 03 '20 at 04:37

You could instantiate the values from a dictionary if you wanted different values for each column & you don't mind making a dictionary on the line before.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
  'col_1': [0, 1, 2, 3], 
  'col_2': [4, 5, 6, 7]
})
>>> df
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7
>>> cols = {
  'column_new_1':np.nan,
  'column_new_2':'dogs',
  'column_new_3': 3
}
>>> df[list(cols)] = pd.DataFrame(data={k:[v]*len(df) for k,v in cols.items()})
>>> df
   col_1  col_2  column_new_1 column_new_2  column_new_3
0      0      4           NaN         dogs             3
1      1      5           NaN         dogs             3
2      2      6           NaN         dogs             3
3      3      7           NaN         dogs             3

Not necessarily better than the accepted answer, but it's another approach not yet listed.

score 0 · Answer 13 · answered May 17 '22 at 10:29

import pandas as pd
df = pd.DataFrame({
 'col_1': [0, 1, 2, 3], 
 'col_2': [4, 5, 6, 7]
 })
df['col_3'],  df['col_4'] =  [df.col_1]*2

>> df
col_1   col_2   col_3   col_4
0      4       0       0
1      5       1       1
2      6       2       2
3      7       3       3

score -1 · Answer 14 · edited Apr 09 '23 at 18:36

I tried your original approach (the one you said didn't work for you) and it worked fine for me, at least in my pandas version (1.5.2)

import pandas as pd
import numpy as np

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

df[['column_new_1', 'column_new_2', 'column_new_3']] = [np.nan, 'dogs', 3]
print(pd.__version__)
print(df)

This is what I got:

1.5.2
   col_1  col_2  column_new_1 column_new_2  column_new_3
0      0      4           NaN         dogs             3
1      1      5           NaN         dogs             3
2      2      6           NaN         dogs             3
3      3      7           NaN         dogs             3

But there's a cooler and more versatile approach

Since probably you'll want to use some logic when adding new columns, another way to add new columns* to a dataframe in one go is to apply a row-wise function with the logic you want. In your example:

def add_3_new_fields_to_each_row(row: pd.Series) -> pd.Series:
    """ Adding 3 new fields to each row of a dataframe is the same as 
    adding 3 new columns to the dataframe """
    row['column_new_1'] = np.nan
    row['column_new_2'] = 'dogs'
    row['column_new_3'] = 3
    # the good thing of this approach is that you could even make the
    # values of "later" fields be dependent on the values of
    # "earlier" fields, all in one go
    return row  # this row now has 3 more fields

df = pd.DataFrame(data)
df_new = df.apply(add_3_new_fields_to_each_row, axis='columns')

By doing this, df is unchanged, but df_new is the dataframe you want:

   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3

* (actually, it returns a new dataframe with the new columns, and doesn't modify the original dataframe)

Notice how `col_1, col_2` changed to float. That's because a Series (`row`) has to be a common dtype, which is int at the start, but switches to float when you assign NaN, then `object` when you assign `'dogs'`. That has the potential to cause all sorts of problems, so I wouldn't use that approach personally. Maybe they wouldn't be major problems, IDK, but still. — wjandrea, Apr 09 '23 at 18:39

How to add multiple columns to pandas dataframe in one assignment

14 Answers14

1) Three assignments in one, using list unpacking:

2) `DataFrame` conveniently expands a single row to match the index, so you can do this:

3) Make a temporary DataFrame with new columns, then combine to the original DataFrame with `.concat`:

4) Similar to 3, but using `join` instead of `concat` (may be less efficient):

5) Using a `dict` is a more "natural" way to create the new DataFrame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

6) Use `.assign()` with multiple column arguments.

7) This is interesting (based on this answer), but I don't know when it would be worth the trouble:

8) In the end, it's hard to beat three separate assignments:

But there's a cooler and more versatile approach

Linked

Related

How to add multiple columns to pandas dataframe in one assignment

14 Answers14

1) Three assignments in one, using list unpacking:

2) DataFrame conveniently expands a single row to match the index, so you can do this:

3) Make a temporary DataFrame with new columns, then combine to the original DataFrame with .concat:

4) Similar to 3, but using join instead of concat (may be less efficient):

5) Using a dict is a more "natural" way to create the new DataFrame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

6) Use .assign() with multiple column arguments.

7) This is interesting (based on this answer), but I don't know when it would be worth the trouble:

8) In the end, it's hard to beat three separate assignments:

But there's a cooler and more versatile approach

Linked

Related

2) `DataFrame` conveniently expands a single row to match the index, so you can do this:

3) Make a temporary DataFrame with new columns, then combine to the original DataFrame with `.concat`:

4) Similar to 3, but using `join` instead of `concat` (may be less efficient):

5) Using a `dict` is a more "natural" way to create the new DataFrame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

6) Use `.assign()` with multiple column arguments.