We use pandas dataframe in our project and we realized that our program is very slow due to pandas dataframe's slow calculations. I shared our code with you.
df_item_in_desc = pd.DataFrame(columns = df.columns) # to hold all satisfied results
for index in df.shape[0]:
s1 = set(df.iloc[index]['desc_words_short'])
if item_number in s1:
df_item_in_desc = df_item_in_desc.append(df.iloc[index])
We check that if item name is in another column desc_words_short
then we append that row to another dataframe (df_item_in_desc
). This is simple logic but to get such rows we should iterate over all dataframe and check that condition. Our dataframe is a bit large and running this code takes more time. How can we speed up this process, can we use Cpu parallelization
in this task, or something else?
Note: We actually tried Cpu parallelization and wouldn't be successful.