0

I am trying to understand threading in Python and i need to use it to translate a dataframe with 204623 rows to run much faster. I need to know how many threads i need to run and how to loop the threads by index for example for i in (I'm not sure about the parameter of the translate function) can you please check my code and correct the code for me please ? This is my code :

    import threading
def trans(number,translator = Translator(),index):

    df['transalted'] = df['ingredients_text'].apply(lambda x: translator.translate(x, dest='fr').text)
    print ("Thread " + str(number))

thread_list = []
index=0
for i in range(1,100):

    t = threading.Thread(target=trans, args=(i,index,))
    thread_list.append(t)
    index+=1

for thread in thread_list:
        thread.start()

for thread in thread_list:
    thread.join()
Yoss
  • 466
  • 2
  • 4
  • 13
  • 1
    For CPU-bound tasks, you'll have significantly diminished results once you have more threads than cores on the computer. That means you should have 8-10 threads, max; and even that may be a little high. Also, are numpy arrays thread-safe? – Carcigenicate May 29 '21 at 14:30
  • Did you search with something like `pandas dataframe apply parallel site:stackoverflow.com`?? Does [pandas multiprocessing apply](https://stackoverflow.com/questions/26784164/pandas-multiprocessing-apply) answer your question? – wwii May 29 '21 at 14:37

0 Answers0