39

Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this...

def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    return clf

when I call the function like this

t = Timer(lambda : RandomForest(trainX,trainy))
print t.timeit(number=1)

P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.

Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
Leon
  • 5,701
  • 3
  • 38
  • 38
  • 2
    Why do you even use `timeit` if you force `number=1`? `timeit` is useful to automatically handle *repetitive* timing, where you don't know how much time you should run the function to get a good timing etc. In your case simply using `time` would be fine and you wouldn't need any hack to get the return value. – Bakuriu Jul 17 '14 at 20:24
  • Can you provide an example link for me to see what you are referring to? I google time and it seems that the module which you might be talking about only seems to involve formatting dates and timezones, etc – Leon Jul 17 '14 at 20:30
  • 2
    Never heard of [`time.time()`](https://docs.python.org/2.7/library/time.html#time.time)? Or [`time.clock()`](https://docs.python.org/2.7/library/time.html#time.clock)? The `timeit` module uses these functions to perform the timings. If you only have to do *one* timing you can simply call them directly, in the same way as the `_timer` function is used in unutbu answer (that is actually a reference to `time.time` or `time.clock` depending on the OS). – Bakuriu Jul 17 '14 at 20:34
  • 1
    @Bakuriu I understood that timeit also does other things, like turn off garbage collection to make sure that we're doing a fair comparison. i.e., that we're looking at execution time, not wall time. – Joel Apr 10 '18 at 05:10

9 Answers9

23

For Python 3.5 you can override the value of timeit.template

timeit.template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        retval = {stmt}
    _t1 = _timer()
    return _t1 - _t0, retval
"""

unutbu's answer works for python 3.4 but not 3.5 as the _template_func function appears to have been removed in 3.5

Community
  • 1
  • 1
18

The problem boils down to timeit._template_func not returning the function's return value:

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            _func()
        _t1 = _timer()
        return _t1 - _t0
    return inner

We can bend timeit to our will with a bit of monkey-patching:

import timeit
import time

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            retval = _func()
        _t1 = _timer()
        return _t1 - _t0, retval
    return inner

timeit._template_func = _template_func

def foo():
    time.sleep(1)
    return 42

t = timeit.Timer(foo)
print(t.timeit(number=1))

returns

(1.0010340213775635, 42)

The first value is the timeit result (in seconds), the second value is the function's return value.

Note that the monkey-patch above only affects the behavior of timeit when a callable is passed timeit.Timer. If you pass a string statement, then you'd have to (similarly) monkey-patch the timeit.template string.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Hmmm,this seems to be returning me the function and not the functions return value. But what I have to do is capture it with ret_val = t.timeit(number=1)[1]() to actually run the function and get me back the value. Isnt that running the function twice though? – Leon Jul 17 '14 at 20:26
  • 1
    Given the code you posted, I don't see why `t.timeit` should be returning a function. Do you get the same result as I do when you run the code I posted? If so, then you need to compare the what's different between that code and your code (paying particular attention to the *type* of the objects passed and returned.) – unutbu Jul 17 '14 at 20:39
  • You are right I was still using timeit.Timer( lambda: dummy) instead of just timeit.Timer( dummy). There are some exceptionally smart ppl on StackOverflow. Damn I love this site. – Leon Jul 17 '14 at 20:46
  • From looking at the source for timeit; it appears the purpose of the module is for it to be used at the command line as a testing tool for optimization of your code and for Python itself. If you are writing an app to test something; say the speed of an API call you may be better of using time.perf_counter twice and doing a subtraction on the two numbers. – Chris Huang-Leaver Dec 14 '18 at 03:09
  • This may have worked, but as of 2023-07-11 it doesn't. The provided code only returns the time taken, as it would normally. The answer by Brendan Cody-Kenny resolves that (although it ain't pretty) – Grismar Jul 11 '23 at 06:25
8

Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-)

I solved it as follows, by writing a function, that:

  • runs your function
  • prints the running time, along with the name of your function
  • returns the results

Let's say you want to time:

clf = RandomForest(train_input, train_output)

Then do:

clf = time_fn( RandomForest, train_input, train_output )

Stdout will show something like:

mymodule.RandomForest: 0.421609s

Code for time_fn:

import time

def time_fn( fn, *args, **kwargs ):
    start = time.clock()
    results = fn( *args, **kwargs )
    end = time.clock()
    fn_name = fn.__module__ + "." + fn.__name__
    print fn_name + ": " + str(end-start) + "s"
    return results
Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
3

If I understand it well, after python 3.5 you can define globals at each Timer instance without having to define them in your block of code. I am not sure if it would have the same issues with parallelization.

My approach would be something like:

clf = ensemble.RandomForestClassifier(n_estimators=10)
myGlobals = globals()
myGlobals.update({'clf'=clf})
t = Timer(stmt='clf.fit(trainX,trainy)', globals=myGlobals)
print(t.timeit(number=1))
print(clf)
Xavier
  • 121
  • 1
  • 2
  • Nice shot, definitely the more elegant solution, it also allows to pass dictionary to `timeit.Timer`. Thank you for sharing – jlandercy Mar 25 '19 at 10:05
3

As of 2020, in ipython or jupyter notebook it is

t = %timeit -n1 -r1 -o RandomForest(trainX, trainy)
t.best
Antony Hatchkins
  • 31,947
  • 10
  • 111
  • 111
  • You're mixing results: The OP wants the result of the timed function `clf` in order to not run this function twice (once to get the result, once to get the time), not the result of the "magic" `timeit` IPython function (which `-o` indeed provides). – mins Dec 17 '20 at 12:23
1

If you don't want to monkey-patch timeit, you could try using a global list, as below. This will also work in python 2.7, which doesn't have globals argument in timeit():

from timeit import timeit
import time

# Function to time - plaigiarised from answer above :-)
def foo():
    time.sleep(1)
    return 42

result = []
print timeit('result.append(foo())', setup='from __main__ import result, foo', number=1)
print result[0]

will print the time and then the result.

Jerzy
  • 670
  • 6
  • 12
0

An approach I'm using it is to "append" the running time to the results of the timed function. So, I write a very simple decorator using the "time" module:

def timed(func):
    def func_wrapper(*args, **kwargs):
        import time
        s = time.clock()
        result = func(*args, **kwargs)
        e = time.clock()
        return result + (e-s,)
    return func_wrapper

And then I use the decorator for the function I want to time.

0

The original question wanted allowance for multiple results, multithreading, and multiprocessing. For all those, a queue will do the trick.

# put the result to the queue inside the function, via globally named qname
def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    global resultq
    resultq.put(clf)
    return clf

# put the result to the queue inside the function, to a queue parameter
def RandomForest(train_input, train_output,resultq):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    resultq.put(clf)
    return clf

# put the result to the queue outside the function
def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    return clf


#usage:
#     global resultq
#     t=RandomForest(train_input, train_output)
#     resultq.put(t)

# in a timeit usage, add an import for the resultq into the setup.
setup="""
from __main__ import resultq
"""

# # in __main__  # #

#  for multiprocessing and/or mulithreading
import multiprocessing as mp
global resultq=mp.Queue() # The global keyword is unnecessary if in __main__ ' Doesn't hurt

# Alternatively, 

# for multithreading only
import queue
global resultq=queue.Queue() # The global keyword is unnecessary if in __main__ ' Doesn't hurt

#   do processing

# eventually, drain the queue

while not resultq.empty():
  aclf=resultq.get()
  print(aclf)
user15972
  • 124
  • 4
-1

For Python 3.X I use this approach:

# Redefining default Timer template to make 'timeit' return
#     test's execution timing and the function return value
new_template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        ret_val = {stmt}
    _t1 = _timer()
    return _t1 - _t0, ret_val
"""
timeit.template = new_template