0

I have a text file with one text per line, a quick way to create the list object is :

from numpy import loadtxt
texts = loadtxt("myfile.txt", dtype=str, delimiter="\n", unpack=False)

However, my text file is 700mb large so when executing this I get OOM error (I have more than 125gb or RAM )

is there any other way of doing this without getting the out of memory error?

Troy
  • 19
  • 5
  • Have a look at https://stackoverflow.com/questions/8956832/python-out-of-memory-on-large-csv-file-numpy – Nico Müller Jun 01 '20 at 13:32
  • Does this answer your question? [Python out of memory on large CSV file (numpy)](https://stackoverflow.com/questions/8956832/python-out-of-memory-on-large-csv-file-numpy) – Nico Müller Jun 01 '20 at 13:33
  • I can't see how this answers my question when the solution was suggesting to use "loadtxt" the one I'm already using instead of "genfromtxt" the one in question in the link that you provided ? – Troy Jun 01 '20 at 13:45
  • I tried the solution and it is an enhancement of the loadtxt for simple data (I do not know the content of your file). Have you tried the code posted in the accepted answer? (Its below the diagrams under "Alternately, consider something like the following. ") – Nico Müller Jun 01 '20 at 13:52
  • Why not `readlines`? What do you intend to with the loaded object? – hpaulj Jun 01 '20 at 13:55
  • I'll be using the object to train a language model ( BERT ) – Troy Jun 01 '20 at 14:21
  • Your question is incomplete; the traceback to show where the memory error occurs (within loadtxt) might help. A small sample of file might as well. Have you even tested this on a small file? What shape and dtype did it produce? – hpaulj Jun 01 '20 at 15:15

0 Answers0