I am trying to run a data cleanup script in Python. I have a base class with a function called cleanData(). Depending on the dataset returned, there are a number of date fields, all of which end in _DT, but could start with anything (such as SCHEDULED_START_DT, SERVICE_DISRUPT_DT, etc). This code will support hundreds of reports, so instead of overloading the object and method for each report, I would like to dynamically run a function on every field that ends in _DT and just call the parent method if there is additional cleanup unique to a report. All this code does is change a UTC epoch timestamp into a readable Local Time Zone. Following are data sample and code I have:
sample data
ID SCHEDULED_START_DT SERVICE_DISRUPTION_START_DT
0 1597669200 1597712400
1 1597667496 None
code snippets
from datetime import datetime, timezone
import datetime as dt
import time
import requests
import pandas as pd
d = {'ID': [0, 1],
'SCHEDULED_START_DT': [1597669200, 1597667496],
'SERVICE_DISRUPTION_START_DT' : [1597712400, None]
}
df = pd.DataFrame(data=d)
df['SCHEDULED_START_DT'] = df['SCHEDULED_START_DT'].apply(lambda x : dt.datetime.fromtimestamp(x) if pd.notnull(x) else x)
df['SERVICE_DISRUPTION_START_DT'] = df['SERVICE_DISRUPTION_START_DT'].apply(lambda x : dt.datetime.fromtimestamp(x) if pd.notnull(x) else x)
output from code
ID SCHEDULED_START_DT SERVICE_DISRUPTION_START_DT
0 2020-08-17 08:00:00 2020-08-17 20:00:00
1 2020-08-17 07:31:36 NaT
I think there is a way to dynamically apply the function on all fields ending in _DT without looping and logic constructs. I have seen some problems that are kind of like this, but I can't figure out how to do it.
Thank you in advance for any help.
Pete