Dataframe list comprehension "zip(...)": loop through chosen df columns efficiently with just a list of column name strings

Question

This is just a nitpicking syntactic question...

I have a dataframe, and I want to use list comprehension to evaluate a function using lots of columns.

I know I can do this

df['result_col'] = [some_func(*var) for var in zip(df['col_1'], df['col_2'],... ,df['col_n'])]

I would like to do something like this

df['result_col'] = [some_func(*var) for var in zip(df[['col_1', 'col_2',... ,'col_n']])]

i.e. not having to write df n times. I cannot for the life of me figure out the syntax.

try this `df['result_col'] = [some_func(*var) for var in zip(*df[col for col in ['col_1', 'col_2',... ,'col_n']])]`? — deadvoid, Oct 02 '18 at 12:07
Why dont you just use apply: `df['reult_col'] = df.apply(lambda x: some_func(*tuple(x.values)), axis=1)` ? — gyx-hh, Oct 02 '18 at 12:10
@cryptonome seems to be a syntax error somewhere.. missing a bracket or paranthesis? — mortysporty, Oct 02 '18 at 12:16
@gyx-hh i thought apply was slow. But honestly I didnt even consider it — mortysporty, Oct 02 '18 at 12:17
@cryptonome `df['result_col'] = [some_func(*var) for var in zip(*[df[col] for col in ['col_1', 'col_2',... ,'col_n']])]` worked. thanks. If you want credit for your answer, post it and I'll tag it as the answer. — mortysporty, Oct 02 '18 at 12:23
@mortysporty perhaps, i didn't try to check the amount of brackets/parens in it. alright then, i'll post an answer, and check the typo first :) — deadvoid, Oct 02 '18 at 12:28
@mortysporty when you apply you're just looping through the dataframe just like what you're doing with list comprehension . — gyx-hh, Oct 02 '18 at 12:33
Possible duplicate of https://stackoverflow.com/questions/40646458/list-comprehension-in-pandas/62062095#62062095 and https://stackoverflow.com/questions/58567199/memory-efficient-way-for-list-comprehension-of-pandas-dataframe-using-multiple-c/62064720#62064720 (though the latter link is younger, so only now it has become a possible duplicate) — questionto42, May 28 '20 at 13:10
@gyx-hh df.apply(), df.itertuples(), df.iteritems(), df.iterrows() are much slower than list comprehension, not recommended, your comment is wrong, apply and list comprehension are not at all equal in speed — questionto42, Jun 25 '20 at 11:36
Thanks for that @Lorenz, was not aware of that - i'm aware df.apply and those methods are slow in general because we are looping, and you should always try to find a different approach that is vectorised - but was not aware list comprehension is faster than df.apply — gyx-hh, Jun 26 '20 at 12:18
Does this answer your question? [list comprehension in pandas](https://stackoverflow.com/questions/40646458/list-comprehension-in-pandas) — questionto42, Jul 08 '22 at 17:44

score 4 · Accepted Answer · answered Oct 02 '18 at 12:32

4

this should work, but honestly, OP figured it himself as well, so +1 OP :)

df['result_col'] = [some_func(*var) for var in zip(*[df[col] for col in ['col_1', 'col_2',... ,'col_n']])]

answered Oct 02 '18 at 12:32

deadvoid

1,270
10
19

score 2 · Answer 2 · answered Oct 02 '18 at 12:35

2

As mentioned in the comments above, you should use apply instead:

df['reult_col'] = df.apply(lambda x: some_func(*tuple(x.values)), axis=1)

answered Oct 02 '18 at 12:35

gyx-hh

1,421
1
10
15

Thanks. Im not going through all of the columns in the dataframe tough.. otherwise this would be cleaner, no doubt. – mortysporty Oct 02 '18 at 12:38
1

you can do `df[['col1', 'col2', 'col3']].apply(...)` – gyx-hh Oct 02 '18 at 12:47
1

df.apply(), df.itertuples(), df.iteritems(), df.iterrows() are much slower than list comprehension, not recommended – questionto42 Jun 25 '20 at 11:35

questionto42 · Answer 3 · 2022-07-08T17:14:16.887

df.apply() is almost as slow as df.iterrows(), both are not recommended, see How to iterate over rows in a DataFrame in Pandas --> search for "An Obvious Example" of @cs95a and see the comparison graph. As the fastest ways (vectorization, Cython routines) are not easy to implement, the 3rd best and thus usually best solution is list comprehension:

# print 3rd col
def some_func(row):
    print(row[2])


df['result_col'] = [some_func(*row) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd col
def some_func(row):
    print(row[2])

df['result_col'] = [some_func(row[0]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd col
def some_func(x):
    print(x)

df['result_col'] = [some_func(row[0][2]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

Dataframe list comprehension "zip(...)": loop through chosen df columns efficiently with just a list of column name strings

3 Answers3

Linked