0

Quick Pandas DataFrame question... Just a conceptual question

Let's say I have a 3 column DataFrame. Call it df:

     A    B    C
0    1    2    3
1    1    2    3
2    1    2    3
3    1    2    3
4    1    2    3

Now let's say I have a function f(A,B,C), which in theory would take columns A, B, and C as inputs. For example,

 def function(A,B,C):
     return (A+1, B/2, C*3)

This function returns a tuple, of course.

Essentially, I'd like to know if I could apply function to df to get the following output:

     A    B    C
0    2    1    9
1    2    1    9
2    2    1    9
3    2    1    9
4    2    1    9

If so, how would I do that? I can't just type df.apply(function). I'll get a TypeError that says something like:

'function()' takes exactly 3 arguments (1 given)'

If I can't do that, I presume I would have to create individual functions? Like...

def f1(A):
    return A+1

def f2(B):
    return B/2

def f3(C):
    return C*3
maths_student15
  • 171
  • 1
  • 1
  • 3

3 Answers3

1

You could do this:

>>> pandas.concat(function(*[col for colname, col in df.iteritems()]), axis=1)
    A  B  C
0  2  1  9
1  2  1  9
2  2  1  9
3  2  1  9
4  2  1  9

If your function operates row-wise (i.e., it accepts three individual values A, B, and C and returns a tuple of three outputs), then you can do it like this:

>>> d.apply(lambda r: function(*r), axis=1)
    A  B  C
0  2  1  9
1  2  1  9
2  2  1  9
3  2  1  9
4  2  1  9

(You need to wrap it in a lambda in order to pass the elements of each row as separate arguments.) But this in efficient if your function is vectorizable, since then you want to operate on the whole column at once, rather than redoing the operation for each row.

You say the function returns "a tuple, of course", but passing separate columns and returning a tuple of them is not a great way to manipulate pandas data structures. The way your function is set up, you want to take the DataFrame apart into separate columns, pass them as separate arguments, retrieve the separate columns as a tuple, and then at the end combine them back into a DataFrame. But there is already a data structure to hold multiple columns, namely a DataFrame. So if you want your function to take some DataFrame columns and return some DataFrame columns, you should just make it accept a DataFrame and return a DataFrame:

def function(df):
    return pandas.concat([df.A+1, df.B/2, df.C*3], axis=1)

(If you don't want the function to depend on the column names, you could have it access the columns by numerical index instead.) Then you can just call the function directly on the DataFrame:

>>> function(d)
    A  B  C
0  2  1  9
1  2  1  9
2  2  1  9
3  2  1  9
4  2  1  9

Of course, if you get the function from somewhere else, you may not be able to rewrite it, in which case you can use the sort of solution I mentioned earlier.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
0

There are two parts in your problem: the axis, and the function application

  1. Axis

you need to apply the function on each row, for this you need to specify axis=1

df.apply(function, axis=1)

Otherwise, by default the function is applied on each column.

  1. Wrapper Function

You need to pass a function that takes one argument, which contains the function (I assume that 'function' is a function that already exists e.g. from a library, and you cannot modify it itself)

def functionwrap(row):
    return function(row[0], row[1], row[2])

df.apply(functionwrap, axis=1)
  1. Lambda Function

Even more compact is to give a lambda function. In this case it works great because the lambda is so simple.

df.apply(lambda x: function(*x), axis=1)

As a reminder, *x is transforming the row into a list of arguments in the function call, effectively calling function(a, b, c). This of course only works with a DataFrame that has 3 columns, otherwise you will get an exception here (i.e. that you are trying to pass n arguments instead of 3, where n is the number of columns in your DataFrame).

birdypme
  • 194
  • 1
  • 12
0
 df.A, df.B, df.C = function(df.A, df.B, df.C)

pandas does the handy thing when you pass in Series.

cphlewis
  • 15,759
  • 4
  • 46
  • 55