Adding multiple rows to pandas dataframe based on returned lambda function

Question

I have a pandas dataframe that can be represented as follows:

myDF = pd.DataFrame({'value':[5,2,4,3,6,1,4,8]})
print(myDF)

   value
0      5
1      2
2      4
3      3
4      6
5      1
6      4
7      8

I can add a new column containing the returned value from a function that acts on the contents of the 'value' column. For example, I can add a column called 'square', which contains the square of the value, by defining a function and then using lambda, as follows:

def myFunc(x):
    mySquare = x*x
    return mySquare

myDF['square'] = myDF['value'].map(lambda x: myFunc(x))

...to produce

   value  square
0      5      25
1      2       4
2      4      16
3      3       9
4      6      36
5      1       1
6      4      16
7      8      64

(N.B. The actual function I'm using is more complex than this but this simple squaring process is OK for illustration.)

My question is, can the myFunc() function return a tuple (or a dictionary or a list) that could be used to add multiple new columns in the dataframe? As a (very simple) example, to add new columns for squares, cubes, fourth powers, is it possible to do something akin to:

def myFunc(x):
    mySquare = x*x
    myCube = x*x*x
    myFourth = x*x*x*x
    return mySquare,myCube,myFourth

myDF['square'],myDF['cubed'],myDF['fourth'] = myDF['value'].map(lambda x: myFunc(x))

...to produce the following:

   value  square  cubed  fourth
0      5      25    125     625
1      2       4      8      16
2      4      16     64     256
3      3       9     27      81
4      6      36    216    1296
5      1       1      1       1
6      4      16     64     256
7      8      64    512    4096

Writing 3 separate functions would seem to be unnecessarily repetitive. None of the variations I've tried so far has worked (the above fails with: ValueError: too many values to unpack (expected 3)).

As mentioned above, the examples of squares, cubes and fourth powers are just for illustration purposes. I know that there are much more effective ways to calculate these values in a dataframe. However, I'm interested in the method to add several columns to a dataframe based on stepping through each cell of a column.

score 0 · Answer 1 · answered Aug 11 '16 at 02:07

You can create a dataframe based on the results and then concatenate it to your original dataframe. You then need to rename your columns.

df = pd.concat([myDF, pd.DataFrame([myFunc(x) for x in myDF['value']])], axis=1)
df.columns = myDF.columns.tolist() + ['square', 'cubed', 'fourth']
>>> df
   value  square  cubed  fourth
0      5      25    125     625
1      2       4      8      16
2      4      16     64     256
3      3       9     27      81
4      6      36    216    1296
5      1       1      1       1
6      4      16     64     256
7      8      64    512    4096

Thanks very much for the response. I'm still not sure I completely understand how your answer works – I just don't find list comprehensions very intuitive (that's what you've used isn't it?). In addition, @Alexander marked this as duplicate of another question - which turns out to be correct. The answer there given by John Galt was exactly what I needed. — user1718097, Aug 12 '16 at 23:35

Matthias Fripp · Answer 2 · 2016-08-13T00:48:57.220

You can do this by unpacking and repacking the result of myFunc() (also note, you don't need a lambda if you already have myFunc available):

myDF['square'],myDF['cubed'],myDF['fourth'] = zip(*myDF['value'].map(myFunc))

Using zip(*arg) is a standard trick to swap the orientation of a collection of tuples. The * converts each row in your result into an argument to the zip() function. Then zip() combines the first element of each of its arguments into a single list (your first column), then the second elements into another list, etc.

Or you could create the columns in bulk and then assign them tuple-wise:

myDF['square'],myDF['cubed'],myDF['fourth'] = myFunc(myDF.value)

Usually, for the sake of readability, I would do something like this:

myDF = pd.DataFrame(
    dict(
        value=myDF['value'],
        square=myDF['value'] ** 2,
        cube=myDF['value'] ** 3,
        fourth=myDF['value'] ** 4
    ),
    columns=['value', 'square', 'cube', 'fourth']  # set column order
)

But really it's hard to beat this:

myDF['square'] = myDF['value'] ** 2
myDF['cube']   = myDF['value'] ** 3
myDF['fourth'] = myDF['value'] ** 4

This is a "pythonic" solution in the sense that it is simple, readable, easy to debug and efficient (i.e., it makes good use of pandas' built-in capabilities).

Thanks for the response. It has taken me a while to understand how you're answer worked – but that's due to my limited understanding of map(). I also hadn't come across zip() before so that was extremely useful. However, @Alexander marked this as duplicate of another question - which turns out to be correct. The answer there given by John Galt was exactly what I needed. — user1718097, Aug 12 '16 at 23:28
@user1718097, sounds good. Just watch out with that solution if you need to create columns with different data types. `Series()` will convert them all to a single 'best' data type. In that case, something like the code above may be more suited. — Matthias Fripp, Aug 13 '16 at 00:44
Thanks for the heads-up! It's comments and support like this that makes SO such a fantastic learning environment. — user1718097, Aug 14 '16 at 14:05

Adding multiple rows to pandas dataframe based on returned lambda function

2 Answers2