This following code works for me though. I don't use func
and usepre_process
straight away. Also, I use context manager/with
statement on the pool
Below is the code running in IPython
.
In [1]: from multiprocessing import Pool, TimeoutError
...: import time
...: import os
In [2]: text = ["The Rock is destined to be the 21st Century 's new `` Conan '' and that he 's going to
...: make a splash even greater than Arnold Schwarzenegger , Jean-Claud Van Damme or Steven Segal .",
...:
...: "The gorgeously elaborate continuation of `` The Lord of the Rings '' trilogy is so huge that a
...: column of words can not adequately describe co-writer/director Peter Jackson 's expanded vision
...: of J.R.R. Tolkien 's Middle-earth .",
...: 'Singer/composer Bryan Adams contributes a slew of songs -- a few potential hits , a few more s
...: imply intrusive to the story -- but the whole package certainly captures the intended , er , spi
...: rit of the piece .',
...: "You 'd think by now America would have had enough of plucky British eccentrics with hearts of
...: gold .",
...: 'Yet the act is still charming here .',
...: "Whether or not you 're enlightened by any of Derrida 's lectures on `` the other '' and `` the
...: self , '' Derrida is an undeniably fascinating and playful fellow .",
...: 'Just the labour involved in creating the layered richness of the imagery in this chiaroscuro o
...: f madness and light is astonishing .',
...: 'Part of the charm of Satin Rouge is that it avoids the obvious with humour and lightness .',
...: "a screenplay more ingeniously constructed than `` Memento ''",
...: "`` Extreme Ops '' exceeds expectations ."]
In [3]: def pre_process(text):
...: '''
...: function to pre-process and clean text
...: '''
...: stop_words = ['in', 'of', 'at', 'a', 'the']
...:
...: # lowercase
...: text=str(text).lower()
...:
...: # remove special characters except spaces, apostrophes and dots
...: text=re.sub(r"[^a-zA-Z0-9.']+", ' ', text)
...:
...: # remove stopwords
...: text=[word for word in text.split(' ') if word not in stop_words]
...:
...: return ' '.join(text)
In [4]: %%time
...: result = []
...: for x in text:
...: result.append(pre_process(x))
...:
...:
CPU times: user 500 µs, sys: 59 µs, total: 559 µs
Wall time: 569 µs
In [5]: %%time
...: with Pool(mp.cpu_count()) as pool:
...: results = pool.map(pre_process, text)
...:
...:
CPU times: user 4.58 ms, sys: 29 ms, total: 33.6 ms
Wall time: 137 ms
In [6]: results
Out[6]:
["rock is destined to be 21st century 's new conan '' and that he 's going to make splash even greater than arnold schwarzenegger jean claud van damme or steven segal .",
"gorgeously elaborate continuation lord rings '' trilogy is so huge that column words can not adequately describe co writer director peter jackson 's expanded vision j.r.r. tolkien 's middle earth .",
'singer composer bryan adams contributes slew songs few potential hits few more simply intrusive to story but whole package certainly captures intended er spirit piece .',
"you 'd think by now america would have had enough plucky british eccentrics with hearts gold .",
'yet act is still charming here .',
"whether or not you 're enlightened by any derrida 's lectures on other '' and self '' derrida is an undeniably fascinating and playful fellow .",
'just labour involved creating layered richness imagery this chiaroscuro madness and light is astonishing .',
'part charm satin rouge is that it avoids obvious with humour and lightness .',
"screenplay more ingeniously constructed than memento ''",
" extreme ops '' exceeds expectations ."]
%%time
is the IPython
magic to measure execution time of a cell. Of course, using such of small sample data, the multiprocessing actually runs slower due to overhead of creating new process.
Anyway, using Pandas.DataFrame
you could just convert the column/Series
to list by list()
as below instead of iterating through it, which is much more efficient.
list(df.text)
Below is the comparison of performance on using list()
instead of iterating it through like how you did. The total is 197 µs vs 564 µs.
In [52]: %%time
...: [s[i] for i in range(len(s))]
...:
...:
CPU times: user 499 µs, sys: 65 µs, total: 564 µs
Wall time: 506 µs
Out[52]:
["The Rock is destined to be the 21st Century 's new `` Conan '' and that he 's going to make a splash even greater than Arnold Schwarzenegger , Jean-Claud Van Damme or Steven Segal .",
"The gorgeously elaborate continuation of `` The Lord of the Rings '' trilogy is so huge that a column of words can not adequately describe co-writer/director Peter Jackson 's expanded vision of J.R.R. Tolkien 's Middle-earth .",
'Singer/composer Bryan Adams contributes a slew of songs -- a few potential hits , a few more simply intrusive to the story -- but the whole package certainly captures the intended , er , spirit of the piece .',
"You 'd think by now America would have had enough of plucky British eccentrics with hearts of gold .",
'Yet the act is still charming here .',
"Whether or not you 're enlightened by any of Derrida 's lectures on `` the other '' and `` the self , '' Derrida is an undeniably fascinating and playful fellow .",
'Just the labour involved in creating the layered richness of the imagery in this chiaroscuro of madness and light is astonishing .',
'Part of the charm of Satin Rouge is that it avoids the obvious with humour and lightness .',
"a screenplay more ingeniously constructed than `` Memento ''",
"`` Extreme Ops '' exceeds expectations ."]
In [53]: %%time
...: list(s)
...:
...:
CPU times: user 174 µs, sys: 23 µs, total: 197 µs
Wall time: 215 µs
Out[53]:
["The Rock is destined to be the 21st Century 's new `` Conan '' and that he 's going to make a splash even greater than Arnold Schwarzenegger , Jean-Claud Van Damme or Steven Segal .",
"The gorgeously elaborate continuation of `` The Lord of the Rings '' trilogy is so huge that a column of words can not adequately describe co-writer/director Peter Jackson 's expanded vision of J.R.R. Tolkien 's Middle-earth .",
'Singer/composer Bryan Adams contributes a slew of songs -- a few potential hits , a few more simply intrusive to the story -- but the whole package certainly captures the intended , er , spirit of the piece .',
"You 'd think by now America would have had enough of plucky British eccentrics with hearts of gold .",
'Yet the act is still charming here .',
"Whether or not you 're enlightened by any of Derrida 's lectures on `` the other '' and `` the self , '' Derrida is an undeniably fascinating and playful fellow .",
'Just the labour involved in creating the layered richness of the imagery in this chiaroscuro of madness and light is astonishing .',
'Part of the charm of Satin Rouge is that it avoids the obvious with humour and lightness .',
"a screenplay more ingeniously constructed than `` Memento ''",
"`` Extreme Ops '' exceeds expectations ."]