0

I am applying some functions to pandas dataframe columns as:

def foo(x):
     return 1 + x

Then, I apply the function to a column:

df['foo'] = df['a_col'].apply(foo)

How can I return a column with the amount of miliseconds that the function foo takes to finish?. For instance:

A time_milisecs
2 0.1
4 0.2
4 0.3
3 0.3
4 0.2

Where A is the column that contains the result of the sum.

jpp
  • 159,742
  • 34
  • 281
  • 339
anon
  • 836
  • 2
  • 9
  • 25
  • Its just an example... obviously its another function I just want to create a column with the time in miliseconds that the funciton takes to finish – anon Dec 19 '18 at 15:06
  • This might be useful: https://stackoverflow.com/questions/24812253/how-can-i-capture-return-value-with-python-timeit-module – cs95 Dec 19 '18 at 15:09

1 Answers1

2

You can use the time module. Given you also wish to create a new series via a calculation, you can output a sequence of tuples, then convert to a dataframe and assign back to two series.

Here's a demonstration:

import time

df = pd.DataFrame({'A': [2, 4, 4, 3, 4]})

def foo(x):
    tstart = time.time()
    time.sleep(0.25)
    tend = time.time()
    return 1 + x, (tend-tstart) * 10**3

df[['B', 'B_time']] = pd.DataFrame(df['A'].apply(foo).values.tolist())

print(df)

   A  B      B_time
0  2  3  250.014544
1  4  5  250.014305
2  4  5  250.014305
3  3  4  250.014305
4  4  5  250.014067

With Python 3.7, you can use time.process_time_ns, which measures time in nanoseconds.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    I would've gone with `timeit`, even though `time` seems like the easy way out. See https://stackoverflow.com/questions/24812253/how-can-i-capture-return-value-with-python-timeit-module on how to monkeypatch `timeit` to return timings and function results together. – cs95 Dec 19 '18 at 15:22
  • 1
    @coldspeed, Do you believe the results will be any different? `timeit` is good, I find, for repeating a process multiple times to get the average time. But for a one-off calculation I don't see the difference. With Python 3.7, you also have `time.process_time_ns`. – jpp Dec 19 '18 at 15:25
  • 2
    The results will be the same, but the timings will be more accurate at least. If performance matters (i.e., if OP doesn't want to wait for the function to run 30 times), or if this is one of those functions that does stuff like db calls, io/async/rpc calls, then certainly not appropriate. Of course, with a question like this, one can never tell :-) – cs95 Dec 19 '18 at 15:27
  • @coldspeed could you provide an example? – anon Dec 20 '18 at 10:35