I have a time point dataset with 1,174,697 rows and a dataframe containing 876,923 rows. The dataframe consists of following columns : time, target, type.
I wanna iterate over the dataframe such that in each row checks the "time" with time points in the dataset, finds all the timepoints in dataset with equal value to "time" and then between those chooses the "target"th one. for example if there are 5 items with the same value to the "time" and the target is equal to 3, it chooses the 4th one starting from beggining because target acts like index.
I will put my code here. The problem is it takes forever to iterate over two loops. I wanna know how can I improve the performance.
timepoint_ds = file['/timepoints']
df = track_df.loc[:, ['time', 'target', 'type']]
label_imgindex_df = pd.DataFrame()
for index, row in df.iterrows():
print("---Row--------------:",index)
hdf_index = row["target"]
label = row["type"]
time= row["time"]
image_index_list, label_list, time_target =[], [], []
for i, value in enumerate(timepoint_ds):
if value == time:
image_index_list.append(i)
label_list.append(label)
label_index_df = pd.DataFrame({'index':image_index_list[hdf_index] , 'label': label} , index= [i])
with open('/home/usr/label_imgindex_df.pkl', 'wb') as f:
pk.dump(label_imgindex_df, f)