I'm challenged with a rather simple task: Coming from my SQL-Query I recieved three different columns: One for hours, one for minutes and one for seconds. I wanted them to be combined into a single time value.
My approach was to apply the dt.time function:
# Import relevant libraries
import datetime as dt
from timeit import timeit
import pandas as pd
import numpy as np
# Create an example Dataframe
rng = np.random.default_rng()
test = pd.DataFrame({"hours": rng.integers(0,24,1000000)
, "minutes" : rng.integers(0, 60, 1000000)
, "seconds": rng.integers(0, 60, 1000000)
})
# Create my time function
test["time"] = test.apply(lambda x: dt.time(x.hours, x.minutes, x.seconds), axis = 1)
The result is ridiculously slow in my real world scenario, clocking in with > 6 minutes for approximately 4 Mio. rows.