Is it possible to add several columns at once to a pandas DataFrame?

Question

If I want to create a new DataFrame with several columns, I can add all the columns at once -- for example, as follows:

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

But now suppose farther down the road I want to add a set of additional columns to this DataFrame. Is there a way to add them all simultaneously, as in

additional_data = {'col_3': [8, 9, 10, 11],
                   'col_4': [12, 13, 14, 15]}
#Below is a made-up function of the kind I desire.
df.add_data(additional_data)

I'm aware I could do this:

for key, value in additional_data.iteritems():
    df[key] = value

Or this:

df2 = pd.DataFrame(additional_data, index=df.index)
df = pd.merge(df, df2, on=df.index)

I was just hoping for something cleaner. If I'm stuck with these two options, which is preferred?

Tbh I think the for loop could be the best/most efficient way from a dict, although join feels "cleaner". — Andy Hayden, Nov 08 '13 at 19:54
@AndyHayden may be it's possible to create new expand method. Can't say for sure, but if the number of columns is large, loop should be less efficient than one-time expand of array, or I'm wrong? — Roman Pekar, Nov 08 '13 at 19:58
@RomanPekar if it's already a python object (a dict) I don't think it's going to get any faster. Hmmm have to test — Andy Hayden, Nov 08 '13 at 20:10
This question seems very similar: https://stackoverflow.com/questions/39050539/adding-multiple-columns-to-pandas-simultaneously — j sad, Jun 28 '19 at 18:37

score 21 · Accepted Answer · edited Jan 29 '18 at 22:38

Pandas has assign method since 0.16.0. You could use it on dataframes like

In [1506]: df1.assign(**df2)
Out[1506]:
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

or, you could directly use the dictionary like

In [1507]: df1.assign(**additional_data)
Out[1507]:
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

score 10 · Answer 2 · 2013-11-08T19:06:23.303

10

What you need is the join function:

df1.join(df2, how='outer')
#or
df1.join(df2) # this works also

Example:

data = {'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]}
df1 = pd.DataFrame(data)

additional_data = {'col_3': [8, 9, 10, 11],
               'col_4': [12, 13, 14, 15]}
df2 = pd.DataFrame(additional_data)

df1.join(df2, how='outer')

output:

   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

edited Nov 08 '13 at 19:06

answered Nov 08 '13 at 18:56

4

Ah, thanks! But am I correct to infer there is no way to add `additional_data` to `df1` without first making a new DataFrame, `df2` (aside from the `for` loop option)? It'd be nice to have `df1.add_data(additional_data)`. I wonder whether Wes would be down for accepting such a submission to pandas. – abcd Nov 08 '13 at 19:16

Roman Pekar · Answer 3 · 2013-11-08T19:36:11.893

8

If you don't want to create new DataFrame from additional_data, you can use something like this:

>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df['col3'], df['col4'] = additional_data
>>> df
   col_1  col_2  col3  col4
0      0      4     8    12
1      1      5     9    13
2      2      6    10    14
3      3      7    11    15

It's also possible to do something like this, but it would be new DataFrame, not inplace modification of existing DataFrame:

>>> additional_header = ['col_3', 'col_4']
>>> additional_data = [[8, 9, 10, 11], [12, 13, 14, 15]]
>>> df = pd.DataFrame(data=np.concatenate((df.values.T, additional_data)).T, columns=np.concatenate((df.columns, additional_header)))
>>> df
   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

edited Nov 08 '13 at 19:36

answered Nov 08 '13 at 19:20

Roman Pekar

107,110
28
195
197

3

Ah, good call. But this option isn't so great when the number of new columns, and their names, can be variable. – abcd Nov 08 '13 at 19:29
@dbliss exactly, I'm thinking about is it possible to do this in general – Roman Pekar Nov 08 '13 at 19:30
@dbliss added another possible solution. Can't say I like it very much. – Roman Pekar Nov 08 '13 at 19:53

Zaros · Answer 4 · 2015-08-21T14:08:16.153

All you need to do is create the new columns with data from the additional dataframe.

data =            {'col_1': [0, 1, 2, 3],
                   'col_2': [4, 5, 6, 7]}
additional_data = {'col_3': [8, 9, 10, 11],
                   'col_4': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data)

df[df2.columns] = df2

df now looks like:

   col_1  col_2  col_3  col_4
0      0      4      8     12
1      1      5      9     13
2      2      6     10     14
3      3      7     11     15

Indices from the original dataframe will be used as if you had performed an in-place left join. Data from the original dataframe in columns with a matching name in the additional dataframe will be overwritten. For example:

data =            {'col_1': [0, 1, 2, 3],
                   'col_2': [4, 5, 6, 7]}
additional_data = {'col_2': [8, 9, 10, 11],
                   'col_3': [12, 13, 14, 15]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(additional_data, index=[0,1,2,4])

df[df2.columns] = df2

df now looks like:

   col_1  col_2  col_3
0      0      8     12
1      1      9     13
2      2     10     14
3      3    NaN    NaN

Is it possible to add several columns at once to a pandas DataFrame?

4 Answers4

Linked