I have this code, in which I have rows around 60k. It taking around 4 hrs to complete the whole process. This code is not feasible and want to use apply instead iterrow because of time constraints.
Here is the code,
all_merged_k = pd.DataFrame(columns=all_merged_f.columns)
for index, row in all_merged_f.iterrows():
if (row['route_count'] == 0):
all_merged_k = all_merged_k.append(row)
else:
for i in range(row['route_count']):
row1 = row.copy()
row['Route Number'] = i
row['Route_Broken'] = row1['routes'][i]
all_merged_k = all_merged_k.append(row)
Basically, what the code is doing is that if the route count is 0 then append the same row, if not then whatever the number of counts is it will append that number of rows with all same value except the routes column (as it contains nested list) so breaking them in multiple rows. And adding them in new columns called Route_Broken and Route Number.
Sample of data:
routes route_count
[[CHN-IND]] 1
[[CHN-IND],[IND-KOR]] 2
O/P data:
routes route_count Broken_Route Route Number
[[CHN-IND]] 1 [CHN-IND] 1
[[CHN-IND],[IND-KOR]] 2 [CHN-IND] 1
[[CHN-IND],[IND-KOR]] 2 [IND-KOR] 2
Can it be possible using apply because 4 hrs is very high and cant be put into production. I need extreme help. Pls help me.
So below code doesn't work
df.join(df['routes'].explode().rename('Broken_Route')) \
.assign(**{'Route Number': lambda x: x.groupby(level=0).cumcount().add(1)})
or
(df.assign(Broken_Route=df['routes'],
count=df['routes'].str.len().apply(range))
.explode(['Broken_Route', 'count'])
)
It doesn't working if the index matches, we can see the last row, Route Number should be 1