Memory error after switching to python 64-bits

Question

I am getting a MemoryError using Python 64-bits. Here is my function:

def entr_langue(path,nom_langue):
    mots_ts=[]
    table_tr=dict((ord(char),None) for char in string.punctuation)#table de translation/mapping
    with codecs.open(path,"r","utf-8") as filep:

        for i,line in enumerate(filep):
            #extraction par ligne
            line=" ".join(line.split()[1:])
            line=line.lower()
            line=re.sub(r"\d+"," ",line) #suppression des digits

            if len(line) !=0:
                line=line.translate(table_tr)#suppression des poncts
                mots_ts += line
                mots_ts.append(" ")#ajout des espaces

    ts_str=''.join(mots_ts)
    ts_str=re.sub(' +',' ',ts_str) #remp des series d'espaces par un seul espace
    seq_ts=[i for i in ts_str]


    #daba extraction des Bigram et les trier selon la frequ
    fn=BigramCollocationFinder.from_words(seq_ts)
    fn.apply_freq_filter(6) #"li 3ndhom frequ 9el m 6 ytfiltraw
    bigram_model=fn.ngram_fd.viewitems()
    bigram_model=sorted(fn.ngram_fd.viewitems(), key=lambda item: item[1],reverse=True)

    print (bigram_model)
    np.save(nom_langue+".npy",bigram_model)

The error:

File "C:/Users/msi/Documents/projIA/extraction_bigram.py", line 23, in entr_langue
    mots_ts += line
  MemoryError

The line `mots_ts += line` is very inefficient. Use `.append()` and `.extend()` for lists. — Klaus D., Jan 12 '19 at 03:38
You may need to also install the 64-bit version of the NLTK (or reinstall it after installing the 64-bit version of Python). — martineau, Jan 12 '19 at 03:39
@KlausD.: `list`s overload `+=` such that it's largely equivalent to `extend`. That said, there is a decent change the OP should be using `append` here; since `line` is a `str`, `+=` (and `extend`) would both add each character from `line` individually, and they probably just want the whole line as a single value. — ShadowRanger, Jan 12 '19 at 03:41
Side-note: Folks, please stop using `codecs.open`. [It's buggy, slow, and unnecessary on Python 2.6 and higher, where `io.open` is available](https://stackoverflow.com/a/46438434/364696). On Py3, `open` is an alias of `io.open`, on Py2, `io.open` is basically a correct, efficient version of `codecs.open`. `with io.open(path,encoding="utf-8"):` is what you want here. — ShadowRanger, Jan 12 '19 at 03:43
It's also possible you can't use the NLTK with 64-bit Python... — martineau, Jan 12 '19 at 03:43

score 0 · Answer 1 · answered Jan 12 '19 at 03:33

0

If u haven't that error on python 32bit, that should be wrong ported code. Becouse on python 64bit you can contain more elements in list, none of standard PC actually can achieve to fullfill that huge data. But, if you even run that on 32bit OS, list can't cointain more than it's possible with 4 GB (or some similar, i'm not sure.)

There is about memory limit similar topic: Memory errors and list limits?

answered Jan 12 '19 at 03:33

Guaz

193
1
1
12

I just consider you used all possible memory for you. In that link you have example of code to check it size with `sizeof()` function :) – Guaz Jan 12 '19 at 03:35

Memory error after switching to python 64-bits

1 Answers1