I'm doing a optimization in a data science code and I have found a slow method. I would like some tips to improve it. Right now I'm testing with a data frame of 43000 rows and is taking around 50 seconds to execute.
I have read about the methods .loc
, .iloc
, .at
, .iat
, .iterrows
and .itertuples
to get a better performance iterating in the data frame, and I think it would be the case here since actually the method is running in a for loop.
def slow_method(sliced_data_frame, labels_nd_array):
sliced_data_frame['column5'] = -1 # creating a new column
for label in np.unique(labels_nd_array):
sliced_data_frame['column5'][labels_nd_array == label] = label,
return sliced_data_frame
Also I'm having a hard time to understand what is happening inside that for loop with that [labels_nd_array == label], the first statement sliced_data_frame['column5']
is selecting the column just created, but the next statement made me confused.