I am trying to create a dummy file to make some ML predictions afterwards. The input are about 2000 'routes' and I want to create a dummy that contains year-month-day-hour combinations for 7 days, meaning 168 rows per route, about 350k rows in total. The problem that I am facing is that pandas becomes terribly slow in appending rows at a certain size.
I am using the following code:
DAYS = [0, 1, 2, 3, 4, 5, 6]
HODS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
ISODOW = {
1: "monday",
2: "tuesday",
3: "wednesday",
4: "thursday",
5: "friday",
6: "saturday",
7: "sunday"
}
def createMyPredictionDummy(start=datetime.datetime.now(), sourceFile=(utils.mountBasePath + 'routeProperties.csv'), destFile=(utils.outputBasePath + 'ToBePredictedTTimes.csv')):
'''Generate a dummy file that can be used for predictions'''
data = ['route', 'someProperties']
dataFile = data + ['yr', 'month', 'day', 'dow', 'hod']
# New DataFrame with all required columns
file = pd.DataFrame(columns=dataFile)
# Old data frame that has only the target columns
df = pd.read_csv(sourceFile, converters=convert, delimiter=',')
df = df[data]
# Counter - To avoid constant lookup for length of the DF
ix = 0
routes = df['route'].drop_duplicates().tolist()
# Iterate through all routes and create a row for every route-yr-month-day-hour combination for 7 day --> about 350k rows
for no, route in enumerate(routes):
print('Current route is %s which is no. %g out of %g' % (str(route), no+1, len(routes)))
routeDF = df.loc[df['route'] == route].iloc[0].tolist()
for i in range(0, 7):
tmpDate = start + datetime.timedelta(days=i)
day = tmpDate.day
month = tmpDate.month
year = tmpDate.year
dow = ISODOW[tmpDate.isoweekday()]
for hod in HODS:
file.loc[ix] = routeDF + [year, month, day, dow, hod] # This is becoming terribly slow
ix += 1
file.to_csv(destFile, index=False)
print('Wrote file')
I think the main problem lies in appending the row with .loc[]
- Is there any way to append a row more efficiently?
If you have any other suggestions, I am happy to hear them all!
Thanks and best,
carbee