In relation to my other question here, this code works if I use a small chunk of my dataset with dtype='int32'
, using a float64
produces a TypeError on my main process after this portion because of safe
rules so I'll stick to working with int32
but nonetheless, I'm curious and want to know about the errors I'm getting.
fp = np.memmap("E:/TDM-memmap.txt", dtype='int32', mode='w+', shape=(len(documents), len(vocabulary)))
matrix = np.genfromtxt("Results/TDM-short.csv", dtype='int32', delimiter=',', skip_header=1)
fp[:] = matrix[:]
If I use the full data (where shape=(329568, 27519)
), with these dtypes:
I get OverflowError when I use int32 or int
and
I get WindowsError when I use float64
Why and how can I fix this?
Edit: Added Tracebacks
Traceback for int32
Traceback (most recent call last):
File "C:/Users/zeferinix/PycharmProjects/Projects/NLP Scripts/NEW/LDA_Experimental1.py", line 123, in <module>
fp = np.memmap("E:/TDM-memmap.txt", dtype='int32', mode='w+', shape=(len(documents), len(vocabulary)))
File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 260, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
WindowsError: [Error 8] Not enough storage is available to process this command
Traceback for float64
Traceback (most recent call last):
File "C:/Users/zeferinix/PycharmProjects/Projects/NLP Scripts/NEW/LDA_Experimental1.py", line 123, in <module>
fp = np.memmap("E:/TDM-memmap.txt", dtype='float64', mode='w+', shape=(len(documents), len(vocabulary)))
File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 260, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OverflowError: cannot fit 'long' into an index-sized integer
Edit: Added other info
Other info that might help: I have a 1TB (931 GB usable) HDD with 2 partitions, Drive D (22.8GB free of 150GB) where my work files are including this script and where the memmap will be written and Drive E (406GB free of 781GB) where my torrent stuff goes. At first I tried to write the mmap file to Drive D and it generated a 1,903,283kb file for int32 and 3,806,566kb file for float64. I thought maybe because it's running out of space that's why I get those errors so I tried Drive E which should be more than enough but it generated the same file size and gave the same error.