While trying to get multiprocessing to work (and understand it) in python 3.3 I quickly reverted to joblib to make my life easier. But I experience something very strange (in my point of view). When running this code (just to test if it works):
Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(200000))
It takes about 9 seconds but by increasing n_jobs
it actually takes longer... for n_jobs=2
it takes 25 seconds and n_jobs=4
it takes 27 seconds.
Correct me if I'm wrong... but shouldn't it instead be much faster if n_jobs
increases? I have an Intel I7 3770K so I guess it's not the problem of my CPU.
Perhaps giving my original problem can increase the possibility of an answer or solution.
I have a list of 30k+ strings, data
, and I need to do something with each string (independent of the other strings), it takes about 14 seconds. This is only the test case to see if my code works. In real applications it will probably be 100k+ entries so multiprocessing is needed since this is only a small part of the entire calculation.
This is what needs to be done in this part of the calculation:
data_syno = []
for entry in data:
w = wordnet.synsets(entry)
if len(w)>0: data_syno.append(w[0].lemma_names[0])
else: data_syno.append(entry)