How to record code execution timings for multiple functions as a pandas dataframe in Python

Question

I have multiple functions, where each function is designed to perform a particular task. For the sake of an example, suppose I want to prepare lunch. Possible functions for this task can be, collect_vege_images() remove_duplicates() and count_veges_in_multiple_plates().

Problem:

I want to generate a dataframe where each row records the function start time, end time and elapsed time.

Iteration   image count   Function            start time       end time   elapse time
1           200           collect_vege_images  11.00           11.10       0.10
            200           remove_duplicates    11.10           11.15       0.5
            100 count_veges_in_multiple_plates  11.16          11.20       0.4
2           300           collect_vege_images   11.21          11.31       0.10
            150           remove_duplicates     11.31          11.35       0.5
            50  count_veges_in_multiple_plates  11.35          11.39       0.4

What I have tried so far

I have written both functions, however, I'm not able to get the desired output that I want. The code is given below as well as the output its generating. Needless to state, I've already looked at similar questions 1, 2, 3,4, but most are related to timing individual functions only. Looking for a simple solution as I'm beginner in python programming.

import os
import pandas as pd
import time
    
VEGE_SOURCE = r'\\path to vegetable images'
VEGE_COUNT = 5
VEGE_DST = r'\\path to storing vegetable images'
LOG_FILE_NAME = 'vege_log.csv'
plates = 5
CYCLES=5
counter = ([] for i in range(7))
VEGEID = 'Potato'

def collect_vege_images(VEGE_SOURCE):
    plate = os.listdir(VEGE_SOURCE)

    if len(plate) == 0:
        print("vege source is empty: ", folder)

    else:

        for i in range(VEGE_COUNT):
            vege = plate[0]
            curr_vege = VEGE_SOURCE + '\\' + vege
            shutil.move(curr_vege, VEGE_DST)
            plate.pop(0)
    return

def count_veges_in_multiple_plates(plates):
    N = 0
    for root, dirname, files in os.walk(plates):
        # print(files)
        file_count = len(files)
        vege_img_count += file_count

    return vege_img_count

    
if __name__ == __main__:

    collect_vege_images(VEGE_SOURCE)
    img_count = count_veges_in_multiple_plates(plates=5)


    for i in range(CYCLES):
        print("Round # ", i)
        counter.append(i)
        # print("counter: ", counter)
        start_time = time.process_time()
        collect_vege_images(VEGE_SOURCE)
        count_veges_in_multiple_plates(plates=5)
        end_time = time.process_time()
        elapse_time = round((end_time - start_time), 2)
        fun = collect_vege_images.__name__
        df = pd.DataFrame(
            {'vegeid': VEGEID, 'imgcnt': img_count, 'func': fun, 'start_time': start_time, 'end_time': end_time,
             'elapse_time': elapse_time}, index=[0])
        print(df)

Current code output given below

Round #  1
Moving files...
122 files moved!
iteration  imgcnt   func                    start_time  end_time  elapse_time
   0         122    collect_vege_images     22.10       22.15         0.5
Round #  2
Moving files...
198 files moved!
iteration  imgcnt   func                    start_time  end_time  elapse_time
   1         122    collect_vege_images     22.15       22.19         0.04

This code gives multiple errors, are you sure this is the right one? — Ietu, Jan 19 '23 at 18:46

Alex Bochkarev · Accepted Answer · 2023-01-23T02:37:53.933

You can create a simple decorator to measure start, end and elapsed time and append it to your dataframe.

import pandas as pd
import time

def timeit(func):
    def patched_func(*args, perf_data, **kwargs):
        time_start = time.process_time()
        result = func(*args, **kwargs)
        time_end = time.process_time()
        new_row = pd.Series(
            {
                'vegeid': kwargs['vege_id'],
                'imgcnt': kwargs['image_count'],
                'func': func.__name__,
                'start_time': time_start,
                'end_time': time_end,
                'elapse_time': time_end - time_start
            }
        )
        perf_data.append(new_row)
        return result
    return patched_func
    
@timeit
def collect(vege_id, image_count):
    pass
    
@timeit
def remove(vege_id, image_count):
    pass

@timeit
def count(vege_id, image_count):
    pass
    
perf_data = []
for i in range(10):
    collect(vege_id=i, image_count=1 * i, perf_data=perf_data)
    remove(vege_id=i, image_count=10 * i, perf_data=perf_data)
    count(vege_id=i, image_count=100 * i, perf_data=perf_data)
    
print(pd.DataFrame(perf_data))

Output:

    vegeid  imgcnt     func  start_time   end_time  elapse_time
0        0       0  collect    9.997021   9.997028     0.000007
1        0       0   remove    9.997492   9.997497     0.000005
2        0       0    count    9.997701   9.997704     0.000003
3        1       1  collect    9.997909   9.997914     0.000004
4        1      10   remove    9.998127   9.998130     0.000003
5        1     100    count    9.998308   9.998311     0.000003

Some explanation on what's happening here.

timeit is a decorator. Decorator is a function that takes another function as an argument, modifies it and returns modified function which then replaces the original one. To use the decorator you use @ before function declaration:

@my_decorator
def my_function():
    ...

which is equivalent of

def my_function():
    ...

my_function = my_decorator(my_function)

Refer to this answer if you want to learn more about decorators.

*args, **kwargs syntax -- is a way to capture any number of positional (passed without keyword) and keyword arguments. You can refer to this answer to learn more.

a follow-up question. How to add code execution iteration count as a column in the decorator `timeit()`? Example, say function `collect()` runs multiple times per code execution cycle. I need a column called say `iteration` with a value of `1`. So if function collect executes five times, then iteration column will have value 1 repeated five times. Please advise me if you know how to accomplish this. — mnm, Jan 31 '23 at 01:32
You can add `iteration` parameter the same way I've added `vege_id` parameter. Pass it to the function, then store it in the pd.Series. — Alex Bochkarev, Feb 01 '23 at 01:44
thanks very much for taking the time out to post a response. It worked. — mnm, Feb 01 '23 at 02:51

How to record code execution timings for multiple functions as a pandas dataframe in Python

1 Answers1