6

I need to time the execution of a function across variable amounts of data.

def foo(raw_data):
   preprocessed_data = preprocess_data(raw_data)
   time = timeit.Timer('module.expensive_func(preprocessed_data)', 'import module').timeit()

However, preprocessed_data is not a global variable. It cannot be imported with from __main__. It is local to this subroutine.

How can i import data into the timeit.Timer environment?

EMiller
  • 2,792
  • 4
  • 34
  • 55
  • 1
    Why not `time = timeit.Timer('module.expensive_func(data)', 'import module;data = generate_data()').timeit()`? Also, if you need something more complicated you may actually want a [profiler](https://docs.python.org/2/library/profile.html). – Steven Rumbalski Sep 10 '14 at 15:09
  • @StevenRumbalski: Works for this scenario, but what if you need the data outside the timer too? – user2357112 Sep 10 '14 at 15:11
  • Bingo. Sorry, @StevenRumbalski, this is indeed the case - the data is outside the timer too. I've updated the question to reflect this. – EMiller Sep 10 '14 at 15:12
  • @EMiller: `'import module;from __main__ import otherdata1, othedata2;data = generate_data()'` You can shove as much code as you want inside that bit of setup code. If you have a lot of code for setup define setup as a multiline string before the `timeit` call. – Steven Rumbalski Sep 10 '14 at 15:16
  • It's more than just an answer to your question, but I feel I should advertise my guide to the `timeit` module: http://stackoverflow.com/a/24105845/1763356 – Veedrac Sep 10 '14 at 20:36

2 Answers2

6

Pass it a callable to time, rather than a string. (Unfortunately, this introduces some extra function call overhead, so it's only viable when the thing to time swamps that overhead.)

time = timeit.timeit(lambda: module.expensive_func(data))

In Python 3.5 and up, you can also specify an explicit globals dictionary with a string statement to time:

time = timeit.timeit('module.expensive_func(data)',
                     globals={'module': module, 'data': data})
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • FWIW, using `functools.partial` removes about half of the function call overhead. – Veedrac Sep 10 '14 at 20:37
  • @Veedrac: I don't think that's actually removing the function call overhead, though. I think it's removing name lookup overhead that would occur in the real case. – user2357112 Sep 10 '14 at 20:52
  • @user2357112 True, but the point is that it's faster :P. Proper timings should time and subtract call overhead, so the smaller it is the less error it produces. – Veedrac Sep 10 '14 at 21:09
1

The accepted answer didn't work for me inside pdb debugger and a class method. The solution that worked is to add the variables to globals():

globals()['data'] = data
globals()['self'] = self
timeit.timeit(lambda: self.function(data))

Note that the timing overhead is a little larger in this case because of the extra function calls. [source]

Dennis Golomazov
  • 16,269
  • 5
  • 73
  • 81