I have a temporal KDF kernel as a list (or numpy array) of values, where value index represents corresponding minute in a week.
my data is approximate as described below:
- kde: list or ndarray of float values, with the length of 7*24*60.
- df: DataFrame with ~ 50 columns of different types, including timestamp
column with integer values within the range (0 to 7*24*60-1). Dataframe has ~ 2000000 records.
as a sample:
col1|col2|...|col49|timestamp
1 | 2 |...| 49 | 15
2 | 3 |...| 50 | 16
My desired output should be the very same dataframe
, with kd
column, including corresponding values from kde
. In other words, for each record in the data frame, I need to get KDE value using record timestamp. I need to do it as fast as possible.
Desired outcome:
col1|col2|...|col49|timestamp | kd
1 | 2 |...| 49 | 15 | 0.342
2 | 3 |...| 50 | 16 | 0.543
for now, I use .apply():
df['kd'] = df.timestamp.apply(lambda z: kde[z])
However, it works relatively slow, as (as far as I understand) it is still subject to GIL limitation. Is there any way to vectorise this very simple function?