Pandas how to pass DataFrame.assign arguments to add multiple new columns?

Question

How can assign be used to return a copy of the original DataFrame with multiple new columns added?

Desired result:

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

The example above results in:

ValueError: Wrong number of items passed 2, placement implies 1.

Background:

The assign function in Pandas takes a copy of the relevant dataframe joined to the newly assigned column, e.g.

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

The 0.19.2 documentation for this function implies that more than one column can be added to the dataframe.

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

In addition:

Parameters:
kwargs : keyword, value pairs

keywords are the column names.

The source code for the function states that it accepts a dictionary:

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

I think the docs should be clearer on how to make this work with multiple columns to avoid ambiguity with a provided example — EdChum, Feb 07 '17 at 23:06
@JJJ I rejected your tag edit because this question has nothing to do with python. See related post on meta. https://meta.stackoverflow.com/questions/303459/should-all-django-questions-get-a-python-tag-too — Alexander, Mar 14 '19 at 05:50
This still happens in pandas 1.3. Your motivating example only has functions of a single column, you wouldn't need to use `.assign`, you could simply do `df['C'] = df['A'] ** 2` ; `df['D'] = df['B'] * 2`. You probably want to change it to a more motivating example, e.g. functions taking 2+ columns, and possibly also referencing previously-defined columns within that `.assign`. — smci, Mar 19 '22 at 02:39

score 36 · Accepted Answer · answered Feb 07 '17 at 23:03

You can create multiple column by supplying each new column as a keyword argument:

df = df.assign(C=df['A']**2, D=df.B*2)

I got your example dictionary to work by unpacking the dictionary as keyword arguments using **:

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

It seems like assign should be able to take a dictionary, but it doesn't look to be currently supported based on the source code you posted.

The resulting output:

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

So far, this is the only way I've found to use assign function with column names with spaces on them, since, AFAIK, Python kwargs cannot have them. — Lucas Lima, Mar 05 '20 at 20:44

Pandas how to pass DataFrame.assign arguments to add multiple new columns?

1 Answers1

Linked