I have a dataframe and a function that I want to apply on multiple columns via pandas.apply. Currently I am doing so using a for loop, which I would like to replace with a single line of code.
This is my dataframe:
d1 = {'id': [1, 1, 2], 'event': ['e', 'c', 'e'], 'var1': [1, 2, 2], 'time_difference': [0, 5, 2]}
df1 = pd.DataFrame(data=d1)
...
>> df1
id event var1 time_difference
0 1 e 1 0
1 1 c 2 5
2 2 e 2 2
This is the function that I want to apply:
def merge_based_on_timelimit(row):
return row[column_of_interest] if row['time_difference'] <= 1\
else pd.NA
These are my columns of interest (on which I want to apply the function on):
columns_of_interest = ['event', 'var1']
Currently I am applying my function on all columns of interest via a for loop:
for column_of_interest in columns_of_interest:
df1[column_of_interest] = df1.apply(merge_based_on_timelimit, axis=1)
However, I am looking for a way to skip the loop and to instead apply my function directly onto all columns of interest. How can I do this? So far I have tried the following:
df1[columns_of_interest] = df1[columns_of_interest].apply(merge_based_on_timelimit, axis=1)
which returns me the following error:
...
redcap[columns_of_interest] = redcap[columns_of_interest].apply(merge_based_on_timelimit, axis=1)
...
KeyError: 'time_difference'