I've got a Pandas dataframe (attendance_records). It contains the attendance records from a local school.
Each row has 4 columns: 1) the name of the student who was absent (name), 2) the address of the student (address), 3) the first day of absence (start), and 4) the last day of absence (end).
For example, one row could be:
Bobby 101 1st Street 9/1/2014 9/3/2014
This means that Bobby was absent each day between 9/1 and 9/3 (inclusive).
I want to unfold the table as follows:
Bobby 101 1st Street 9/1/2014
Bobby 101 1st Street 9/2/2014
Bobby 101 1st Street 9/3/2014
So far I have some code (below) that will do the job, but it's realllly slow on large tables, because it's basically going row by row through the table. Any ideas about how to make things faster?
import pandas as pd
def full_data(dataframe):
allframe = pd.DataFrame()
for i in dataframe.index:
newframe = pd.DataFrame()
newframe['dates'] = pd.date_range(dataframe.iloc[i].start, dataframe.iloc[i].end, freq = 'D')
newframe['name'] = dataframe.iloc[i]['name']
newframe['address'] = dataframe.iloc[i]['address']
allframe = allframe.append(newframe)
if i%1000 == 0:
print i
return allframe
attendance_records = full_data(attendance_records)