150

I have a dataframe that may look like this:

A        B        C
foo      bar      foo bar
bar foo  foo      bar

I want to look through every element of each row (or every element of each column) and apply the following function to get the subsequent dataframe:

def foo_bar(x):
    return x.replace('foo', 'wow')

After applying the function, my dataframe will look like this:

A        B        C
wow      bar      wow bar
bar wow  wow      bar

Is there a simple one-liner that can apply a function to each cell?

This is a simplistic example so there may be an easier way to execute this specific example other than applying a function, but what I am really asking about is how to apply a function in every cell within a dataframe.

cottontail
  • 10,268
  • 18
  • 50
  • 51
eljusticiero67
  • 2,257
  • 4
  • 15
  • 18

3 Answers3

221

You can use applymap() which is concise for your case.

df.applymap(foo_bar)

#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar

Another option is to vectorize your function and then use apply method:

import numpy as np
df.apply(np.vectorize(foo_bar))
#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar
normanius
  • 8,629
  • 7
  • 53
  • 83
Psidom
  • 209,562
  • 33
  • 339
  • 356
3

I guess you could use np.vectorize:

>>> df[:] = np.vectorize(foo_bar)(df)
>>> df
       A    B    C
foo  bar  wow  bar
bar  wow  wow  bar
>>> 

This might be quicker, since it's using numpy.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

Expanding on Psidom's answer, if the function you define accepts additional arguments, then you can pass them along using kwargs. For example, to toggle repl of foo_bar() in the OP:

def foo_bar(x, bar=''):
    return x.replace('foo', bar)

df.applymap(foo_bar, bar='haha')

One of the common cases where applymap is especially useful is string operations (as in the OP). Since string operations in pandas are not optimized, a loop often performs better than vectorized operations especially if there are many operations. For example, for the following simple task of replacing values in a frame using a condition, applymap is over 3 times faster than an equivalent vectorized pandas code.

def foo_bar(x):
    return x.replace('foo', 'wow') if len(x)>3 else x + ' this'

df = pd.DataFrame([['foo', 'bar', 'foo bar'], ['bar foo', 'foo', 'bar']]*500000, columns=[*'ABC'])

%timeit df.applymap(foo_bar)
# 1.47 s ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.apply(lambda x: np.where(x.str.len()>3, x.str.replace('foo', 'wow'), x + ' this'))
# 4.64 s ± 597 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
cottontail
  • 10,268
  • 18
  • 50
  • 51