How to apply a function to two columns of Pandas dataframe

Question

Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function :

f = lambda x, y : my_function_expression.

Now I want to apply the f to df's two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' , somewhat like :

df['col_3'] = df[['col_1','col_2']].apply(f)  
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

How to do ?

** Add detail sample as below ***

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

  ID  col_1  col_2            col_3
0  1      0      1       ['a', 'b']
1  2      2      4  ['c', 'd', 'e']
2  3      3      5  ['d', 'e', 'f']

I found a related Q&A at below url, but my issue is calculating a new column by two existing columns, not 2 from 1 . http://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre?rq=1 — bigbug, Nov 11 '12 at 14:22

ajrwhite · Answer 1 · 2020-04-24T15:13:31.700

634

There is a clean, one-line way of doing this in Pandas:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

import pandas as pd

df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

Output of print(df):

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

edited Apr 24 '20 at 15:13

answered Oct 17 '18 at 12:22

ajrwhite

7,728
1
11
24

4

Note, if using `axis=1` and you column is called `name` it will not actually return your column data but the `index`. Similar as to getting the `name` in a `groupby()`. I solved this by renaming my column. – Tom Hemmes May 22 '19 at 13:58
15

THIS IS IT! I just didn't realize you could insert user-defined functions with multiple input parameters into lambdas. It's important to note (I think) that you're using DF.apply() rather than Series.apply(). This lets you index the df using the two columns you want, and pass the entire column into the function, but because you're using apply(), it applies the function in an element-wise fashion down the whole column. Brilliant! Thank you for posting! – Data-phile May 31 '19 at 22:36
I believe the suggested way to do this is df.loc[:, 'new col'] = df.apply..... – valearner Feb 10 '20 at 19:38
@valearner I don't think there's any reason to prefer `.loc` in the example. It might be needed if you adapt this to another problem setting (e.g. working with slices). – ajrwhite Feb 17 '20 at 02:05
Thanks for sharing that. What if your columns' names have spaces in them? – Mez13 Apr 23 '20 at 14:33
2

@Mez13 you can also use `f(x['col 1'], x['col 2'])` style indexing if necessary (e.g. if your column names have spaces or protected names). – ajrwhite Apr 24 '20 at 15:09
@ajrwhite thanks. At then, I have used `np.vectorize`. Its quite flexible – Mez13 Apr 24 '20 at 15:34
Can you expalain df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1). – Dinh Quang Tuan Jun 26 '20 at 10:33
df['col_3'] = df.apply(df['col 1'], df['col 2']), axis=1) does not work, why? – Dinh Quang Tuan Jun 26 '20 at 10:33
This method may throw `ValueError: Shape of passed values is (…, …), indices imply (…, …)` with older versions of `pandas` (e.g. 0.20.3). – Skippy le Grand Gourou Dec 18 '20 at 13:16
This is a very good answer! Finally! It solves a much more general problem. Naimely, `df['C'] = my_function(df['A'])` will not work. However, `df['C'] = df.apply(lambda x: my_function(x.A), axis=1)` will work. – SomJura May 03 '21 at 13:04
@ajrwhite What do "safe" and "unsafe" mean here in reference to column names v numeric indices? Also, thank you for the supremely helpful answer! – BLimitless May 17 '21 at 18:47
@BLimitless if you drop columns or sort/reorder them, the numeric indexing will fail (hence "unsafe"), but referencing by column name will continue to work. – ajrwhite Jun 10 '21 at 12:56
I had upvoted this kick-ass answer sometime in the distant past. Has aged extraordinarily well – WestCoastProjects Aug 24 '22 at 05:36
This doesn't work for rolling applies – Tom Nov 03 '22 at 23:38
This works fine. Note that you don't need a lambda (an unnamed function), since you already have a named function. https://pastebin.com/yKnyGtKe . It looks much cleaner to simply call `df.apply(get_sublist, axis=1)`. – Eric Duminil Jul 07 '23 at 07:25
@EricDuminil you need the lambda to split the variables - have you tried your solution on the original question? – ajrwhite Jul 09 '23 at 01:26
Yes, my proposed code works fine with the question's example, and it's self-contained. You can run it. If you have the possibility to modify the original function, you can simply let it accept a pandas row. You don't necessarily need a lambda for df.apply. – Eric Duminil Jul 09 '23 at 13:13
@EricDuminil your proposed solution doesn’t specify which columns are passed to the function. It is intended for a multi column (n cols > 2) DataFrame and we are passing specific columns to the function. That’s why a standard apply doesn’t work, and then you need the lambda to achieve it. I think you need to work through the example to understand why the lambda is being used – ajrwhite Jul 10 '23 at 15:11
@ajrwhite: Sorry, I don't understand your comment. Yes, I modified the function. It accepts a pandas row (a `pd.Series`, as far as I can tell), and the columns are specified inside the function. My point is simply that a lambda isn't necessarily required for `df.apply`. – Eric Duminil Jul 10 '23 at 18:13

Aman · Answer 2 · 2012-11-12T14:57:18.917

475

Here's an example using apply on the dataframe, which I am calling with axis = 1.

Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.

In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000

Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.

edited Nov 12 '12 at 14:57

answered Nov 12 '12 at 01:39

Aman

45,819
7
35
37

Yes, i tried to use apply, but can't find the valid syntax expression. And if each row of df is unique, still use groupby? – bigbug Nov 12 '12 at 10:42
Added an example to my answer, hope this does what you're looking for. If not, please provide a more specific example function since `sum` is solved successfully by any of the methods suggested so far. – Aman Nov 12 '12 at 14:51
i provide a detail sample in question. How to use Pandas 'apply' function to create 'col_3' ? – bigbug Nov 13 '12 at 13:02
@bigbug My answer is apply-cable (haha) for the example you added to your question. Use apply on the whole dataframe, passing in rows with df.apply(f, axis=1). Then rewrite your function `get_sublist(x)` to index the col values like this `start_idx = x[1], end_idx = x[2]`. – Aman Nov 13 '12 at 15:49
2

Would you pls paste your code ? I rewrite the function: def get_sublist(x): return mylist[x[1]:x[2] + 1] and df['col_3'] = df.apply(get_sublist, axis=1) gives 'ValueError: operands could not be broadcast together with shapes (2) (3)' – bigbug Nov 16 '12 at 07:11
7

@Aman: with Pandas version 0.14.1 (and possibly earlier), use can use a lambda expression as well. Give the `df` object you defined, another approach (with equivalent results) is `df.apply(lambda x: x[0] + x[1], axis = 1)`. – Jubbles Jan 10 '15 at 01:37
Is it possible to have `f()` return a `dict`? When I try that, I get ` – scharfmn Aug 23 '15 at 10:01
@Jubbles: Yeah, good point. In fact, the OP used lambdas (back in 2012!). I was just matching the OP's format in my answer. – Aman Aug 25 '15 at 00:26
@bahmait Should be fine returning a dict. Maybe start a new question if you're having issues. – Aman Aug 25 '15 at 00:28
this does not work anymore: `TypeError: ('f() takes exactly 2 arguments (1 given)', u'occurred at index 0')` – denfromufa Nov 12 '15 at 14:59
@Aman thanks for your answer. Is there a way to pass an argument as well? My aim is actually to pass column indexes as a parameter as well, so that when the order of the columns changes, we can easily send different indexes. – CanCeylan Aug 07 '17 at 12:43
4

@CanCeylan you can just use the column names in the function instead of indexes then you don't need to worry about order changing, or get the index by name e.g. see https://stackoverflow.com/questions/13021654/get-column-index-from-column-name-in-python-pandas – Davos Mar 22 '18 at 07:19
when you use apply it sends df as variable x? – haneulkim Sep 30 '19 at 04:09
I am missing an example with groupby that is being alluded to at the bottom. Would be nice to illustrate. – DISC-O Oct 02 '22 at 19:35

score 172 · Answer 3 · answered Aug 31 '16 at 21:39

172

A simple solution is:

df['col_3'] = df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)

answered Aug 31 '16 at 21:39

sjm

1,829
1
9
2

3

how is this answer different to the approach in thequestion: df['col_3'] = df[['col_1','col_2']].apply(f) just to confirm, the approach in the question didn't work because the poster did not specify this axis=1, the default is axis = 0? – Lost1 Jul 19 '17 at 19:41
3

This answer is comparable to @Anman's answer but a bit slicker. He is constructing an anonymous function which takes an iterable, and unpacks it before passing it to function f. – tiao Nov 10 '17 at 15:15
8

This method is twice faster in my case, with 100k rows (compared to `df.apply(lambda x: f(x.col_1, x.col_2), axis=1)`) – Sylvain Dec 04 '20 at 00:13
2

@sjm Nice! But what if the arguments of x are a mixture of args and kwargs etc.? – jtlz2 Mar 31 '22 at 13:05

score 46 · Answer 4 · 2015-04-24T02:39:14.127

A interesting question! my answer as below:

import pandas as pd

def sublst(row):
    return lst[row['J1']:row['J2']]

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(sublst,axis=1)
print df

Output:

  ID  J1  J2
0  1   0   1
1  2   2   4
2  3   3   5
  ID  J1  J2      J3
0  1   0   1     [a]
1  2   2   4  [c, d]
2  3   3   5  [d, e]

I changed the column name to ID,J1,J2,J3 to ensure ID < J1 < J2 < J3, so the column display in right sequence.

One more brief version:

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(lambda row:lst[row['J1']:row['J2']],axis=1)
print df

axis=1 is what I was after thanks. – Quinten C May 07 '22 at 10:12 — Quinten C, May 07 '22 at 10:12

score 28 · Answer 5 · answered Mar 05 '15 at 15:20

The method you are looking for is Series.combine. However, it seems some care has to be taken around datatypes. In your example, you would (as I did when testing the answer) naively call

df['col_3'] = df.col_1.combine(df.col_2, func=get_sublist)

However, this throws the error:

ValueError: setting an array element with a sequence.

My best guess is that it seems to expect the result to be of the same type as the series calling the method (df.col_1 here). However, the following works:

df['col_3'] = df.col_1.astype(object).combine(df.col_2, func=get_sublist)

df

   ID   col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

score 25 · Answer 6 · answered Oct 25 '17 at 02:54

Returning a list from apply is a dangerous operation as the resulting object is not guaranteed to be either a Series or a DataFrame. And exceptions might be raised in certain cases. Let's walk through a simple example:

df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

There are three possible outcomes with returning a list from apply

1) If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.

df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

2) When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.

df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

3) If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.

i = 0
def f(x):
    global i
    if i == 0:
        i += 1
        return list(range(3))
    return list(range(4))

df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

Answering the problem without apply

Using apply with axis=1 is very slow. It is possible to get much better performance (especially on larger datasets) with basic iterative methods.

Create larger dataframe

df1 = df.sample(100000, replace=True).reset_index(drop=True)

Timings

# apply is slow with axis=1
%timeit df1.apply(lambda x: mylist[x['col_1']: x['col_2']+1], axis=1)
2.59 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# zip - similar to @Thomas
%timeit [mylist[v1:v2+1] for v1, v2 in zip(df1.col_1, df1.col_2)]  
29.5 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@Thomas answer

%timeit list(map(get_sublist, df1['col_1'],df1['col_2']))
34 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

It nice to see so detailed answers from where it's possible to learn. — Andrea Moro, Feb 14 '20 at 02:19
For the latest pandas version(1.3.1), returned list is preserved and all three examples above works fine. All the result will be pd.Series with dtype='object'. BUT pd.apply(f, axis=0) works similar to the above. It's strange the pd.DataFrame.apply breaks the symmetry which means df.T.apply(f, axis=0).T is not always the same with df.apply(f, axis=1). For example, when `f = lambda x: list(range(2))`, `df.T.apply(f, axis=0).T` and `df.apply(f, axis=1)` are not the same. — KH Kim, Aug 04 '21 at 06:03

Rivers · Answer 7 · 2022-07-10T14:10:57.820

Here is a faster solution:

def func_1(a,b):
    return a + b

df["C"] = func_1(df["A"].to_numpy(),df["B"].to_numpy())

This is 380 times faster than df.apply(f, axis=1) from @Aman and 310 times faster than df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1) from @ajrwhite.

I add some benchmarks too:

Results:

  FUNCTIONS   TIMINGS   GAIN
apply lambda    0.7     x 1
apply           0.56    x 1.25
map             0.3     x 2.3
np.vectorize    0.01    x 70
f3 on Series    0.0026  x 270
f3 on np arrays 0.0018  x 380
f3 numba        0.0018  x 380

In short:

Using apply is slow. We can speed up things very simply, just by using a function that will operate directly on Pandas Series (or better on numpy arrays). And because we will operate on Pandas Series or numpy arrays, we will be able to vectorize the operations. The function will return a Pandas Series or numpy array that we will assign as a new column.

And here is the benchmark code:

import timeit

timeit_setup = """
import pandas as pd
import numpy as np
import numba

np.random.seed(0)

# Create a DataFrame of 10000 rows with 2 columns "A" and "B" 
# containing integers between 0 and 100
df = pd.DataFrame(np.random.randint(0,10,size=(10000, 2)), columns=["A", "B"])

def f1(a,b):
    # Here a and b are the values of column A and B for a specific row: integers
    return a + b

def f2(x):
    # Here, x is pandas Series, and corresponds to a specific row of the DataFrame
    # 0 and 1 are the indexes of columns A and B
    return x[0] + x[1]  

def f3(a,b):
    # Same as f1 but we will pass parameters that will allow vectorization
    # Here, A and B will be Pandas Series or numpy arrays
    # with df["C"] = f3(df["A"],df["B"]): Pandas Series
    # with df["C"] = f3(df["A"].to_numpy(),df["B"].to_numpy()): numpy arrays
    return a + b

@numba.njit('int64[:](int64[:], int64[:])')
def f3_numba_vectorize(a,b):
    # Here a and b are 2 numpy arrays with dtype int64
    # This function must return a numpy array whith dtype int64
    return a + b

"""

test_functions = [
'df["C"] = df.apply(lambda row: f1(row["A"], row["B"]), axis=1)',
'df["C"] = df.apply(f2, axis=1)',
'df["C"] = list(map(f3,df["A"],df["B"]))',
'df["C"] = np.vectorize(f3) (df["A"].to_numpy(),df["B"].to_numpy())',
'df["C"] = f3(df["A"],df["B"])',
'df["C"] = f3(df["A"].to_numpy(),df["B"].to_numpy())',
'df["C"] = f3_numba_vectorize(df["A"].to_numpy(),df["B"].to_numpy())'
]


for test_function in test_functions:
    print(min(timeit.repeat(setup=timeit_setup, stmt=test_function, repeat=7, number=10)))

Output:

Final note: things could be optimzed with Cython and other numba tricks too.

score 19 · Answer 8 · answered Aug 11 '16 at 00:57

I'm sure this isn't as fast as the solutions using Pandas or Numpy operations, but if you don't want to rewrite your function you can use map. Using the original example data -

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = list(map(get_sublist,df['col_1'],df['col_2']))
#In Python 2 don't convert above to list

We could pass as many arguments as we wanted into the function this way. The output is what we wanted

ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

This is actually much faster those answers that use `apply` with `axis=1` — Ted Petrou, Oct 25 '17 at 01:06
This is 4 years later, but such a fast idiom compared to apply! Thanks from the future. — Chris, Sep 24 '21 at 01:23

score 17 · Answer 9 · answered Apr 08 '16 at 00:59

I'm going to put in a vote for np.vectorize. It allows you to just shoot over x number of columns and not deal with the dataframe in the function, so it's great for functions you don't control or doing something like sending 2 columns and a constant into a function (i.e. col_1, col_2, 'foo').

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

df.loc[:,'col_3'] = np.vectorize(get_sublist, otypes=["O"]) (df['col_1'], df['col_2'])


df

ID  col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

The question is "How to apply a function to two columns of Pandas dataframe" not "How to apply a function to two columns of Pandas dataframe using only Pandas methods" and numpy is a dependency of Pandas so you have to have it installed anyway, so this seems like a strange objection. — Trae Wallace, May 19 '16 at 15:43

score 13 · Answer 10 · answered May 30 '13 at 00:53

The way you have written f it needs two inputs. If you look at the error message it says you are not providing two inputs to f, just one. The error message is correct.
The mismatch is because df[['col1','col2']] returns a single dataframe with two columns, not two separate columns.

You need to change your f so that it takes a single input, keep the above data frame as input, then break it up into x,y inside the function body. Then do whatever you need and return a single value.

You need this function signature because the syntax is .apply(f) So f needs to take the single thing = dataframe and not two things which is what your current f expects.

Since you haven't provided the body of f I can't help in anymore detail - but this should provide the way out without fundamentally changing your code or using some other methods rather than apply

Janosh · Answer 11 · 2021-12-08T08:43:50.467

9

Another option is df.itertuples() (generally faster and recommended over df.iterrows() by docs and user testing):

import pandas as pd

df = pd.DataFrame([range(4) for _ in range(4)], columns=list("abcd"))

df
    a   b   c   d
0   0   1   2   3
1   0   1   2   3
2   0   1   2   3
3   0   1   2   3


df["e"] = [sum(row) for row in df[["b", "d"]].itertuples(index=False)]

df
    a   b   c   d   e
0   0   1   2   3   4
1   0   1   2   3   4
2   0   1   2   3   4
3   0   1   2   3   4

Since itertuples returns an Iterable of namedtuples, you can access tuple elements both as attributes by column name (aka dot notation) and by index:

b, d = row
b = row.b
d = row[1]

edited Dec 08 '21 at 08:43

answered Nov 30 '21 at 12:15

Janosh

3,392
2
27
35

3

From my experience, `itertuples` is sometimes much faster than `df.apply(..., axis=1)`. For large tables I have seen the time going from around 3 minutes (using `apply`) down to 10 seconds (using `itertuples`. Personally I also think `itertuples` is sometimes more readable; it reads like pseudocode. Note that elements of the tuples can be accessed either by name or position (i.e, in the answer above where `index=False`, `row.b` is equivalent to `row[0]`). – DustByte Dec 06 '21 at 13:03

score 7 · Answer 12 · answered May 25 '17 at 09:36

7

My example to your questions:

def get_sublist(row, col1, col2):
    return mylist[row[col1]:row[col2]+1]
df.apply(get_sublist, axis=1, col1='col_1', col2='col_2')

answered May 25 '17 at 09:36

Qing Liu

71
1
1

score 6 · Answer 13 · answered Apr 14 '22 at 20:23

It can be done in two simple ways: Let's say, we want sum of col1 and col2 in output column named col_sum

Method 1

f = lambda x : x.col1 + x.col2
df['col_sum'] = df.apply(f, axis=1)

Method 2

def f(x):
    x['col_sum'] = x.col_1 + col_2
    return x
df = df.apply(f, axis=1)

Method 2 should be used when some complex function has to applied to the dataframe. Method 2 can also be used when output in multiple columns is required.

how would you do this with a Rolling.apply? – Tom Nov 03 '22 at 23:38 — Tom, Nov 03 '22 at 23:38

score 4 · Answer 14 · answered Apr 25 '18 at 16:27

I suppose you don't want to change get_sublist function, and just want to use DataFrame's apply method to do the job. To get the result you want, I've wrote two help functions: get_sublist_list and unlist. As the function name suggest, first get the list of sublist, second extract that sublist from that list. Finally, We need to call apply function to apply those two functions to the df[['col_1','col_2']] DataFrame subsequently.

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

def get_sublist_list(cols):
    return [get_sublist(cols[0],cols[1])]

def unlist(list_of_lists):
    return list_of_lists[0]

df['col_3'] = df[['col_1','col_2']].apply(get_sublist_list,axis=1).apply(unlist)

df

If you don't use [] to enclose the get_sublist function, then the get_sublist_list function will return a plain list, it'll raise ValueError: could not broadcast input array from shape (3) into shape (2), as @Ted Petrou had mentioned.

score 3 · Answer 15 · answered Aug 30 '19 at 04:33

If you have a huge data-set, then you can use an easy but faster(execution time) way of doing this using swifter:

import pandas as pd
import swifter

def fnc(m,x,c):
    return m*x+c

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})
df["y"] = df.swifter.apply(lambda x: fnc(x.m, x.x, x.c), axis=1)

How to apply a function to two columns of Pandas dataframe

15 Answers15

Answering the problem without apply

Timings

Linked

Related