how to transform for loop to lambda function

Question

I have written this function:

def time_to_unix(df,dateToday):
    '''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
        and then it adds the seconds that were originally in the timestamp column.

        input: dataframe, dateToday(type: pandas.core.series.Series)
        output: list of times
    '''

    dateTime = dateToday[0]
    times = []

    for i in range(0,len(df['timestamp'])):
        dateAndTime = dateTime + timedelta(seconds = float(df['timestamp'][i]))
        unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
        times.append(unix[0])
    return times

so it takes a dataframe and it gets today's date and then its taking the value of the timestamp in the dataframe( which is in seconds like 10,20,.... ) then it applies the function and returns times in unix time

however, because I have approx 2million row in my dataframe, its taking me a lot of time to run this code.

how can I use lambda function or something else in order to speed up my code and the process.

something along the line of:

df['unix'] = df.apply(lambda row : something in here), axis = 1)

What exactly is `dateToday` (type, content, etc.) -- `dateToday[0]` is really strange? Please add some sample data and corresponding expected output. — Timus, Aug 10 '22 at 12:38
@Timus - its in the comments (dateToday(type: pandas.core.series.Series)) — Allan Elder, Aug 10 '22 at 12:46
@AllanElder Damn, my bad, thanks! But does it contain `pd.Timestamps`? Or `datetime.date`s? — Timus, Aug 10 '22 at 12:52
@timus - that I'm not clear on - sample data would definitely help — Allan Elder, Aug 10 '22 at 12:55

score 0 · Answer 1 · answered Aug 10 '22 at 11:38

What I think you'll find is that most of the time is spent in the creation and manipulation of the datetime / timestamp objects in the dataframe (see here for more info). I also try to avoid using lambdas like this on large dataframes as they go row by row which should be avoided. What I've done when dealing with datetimes / timestamps / timezone changes in the past is to build a dictionary of the possible datetime combinations and then use map to apply them. Something like this:

import datetime as dt
import pandas as pd


#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) + '@' + df['timestamp']

#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
    time_split = time_key.split('@')
    #Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
    time_dict[time_key] = (pd.to_datetime(time_split[0]) + dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9

#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)

Note if all the datetime combinations are unique in the dataframe, this likely won't help.

score 0 · Answer 2 · answered Aug 10 '22 at 13:05

I'm not exactly sure what type dateTime[0] has. But you could try a more vectorized approach:

import pandas as pd

df["unix"] = (
    (pd.Timestamp(dateTime[0]) + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

or

df["unix"] = (
    (dateTime[0] + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

how to transform for loop to lambda function

2 Answers2