2

I have a spell check function( Peter Novig's spell correction) that works on small data frames but for data frames with 5000 words it takes a long time to run and I stop the program. Does anyone have a solution?

#correct spelling
import enchant
from spellchecker import SpellChecker
spell = SpellChecker()
def spell_correct(text):
        try:
            output = ""
            splited_words = text.split()
            d = enchant.Dict("en_US")
            for i in splited_words:
                if d.check(i):
                    output = output + i + " "
                else:
                    output = output + spell.correction(i) + " "
        except Exception as e:
            print(e)
        return output
    
df["Text"] = df["Text"].apply(spell_correct)
df

saba kjh
  • 43
  • 6
  • I think time of execution depends on programming language and how to coding, i had issue like this, i had over time when data more than 30000 in laravel project so I figured out it with making paginate from back end, It was paginated from front end before – Nurbek Boymurodov Sep 15 '20 at 12:49
  • also I think that time execution is taking long time in big data it is common situation – Nurbek Boymurodov Sep 15 '20 at 12:51
  • So let the program run? @NurbekBoymurodov – saba kjh Sep 15 '20 at 12:53
  • 1
    first you could create `d = enchant.Dict("en_US")` only once - outside function `spell_correct()`. It should works little faster. Second: you coould keep `output` as list of items and join it to string only once - at the end. It may also works little faster. But if it still too slow then you may have to rewrite it in other languaga (C/C++/Go/Rust/Julia) or try to use `Cython` or `Numba` which can compile some code to C/C++. [Numba vs Cython: How to Choose](http://stephanhoyer.com/2015/04/09/numba-vs-cython-how-to-choose/). OR maybe it could run faster using `threading` or `multiprocessing`. – furas Sep 15 '20 at 14:54
  • see [pandarallel](https://github.com/nalepae/pandarallel) to run code in many threads. – furas Sep 15 '20 at 14:58
  • @saba kjh yup let the program run or rewrite it in other language as furas said, i have not wroten any project in GO language but some of my friends said that GO is faster than other language who coding in GO – Nurbek Boymurodov Sep 16 '20 at 05:22

0 Answers0