I am trying to implement the solution given in this answer to read my ~3.3GB ASCII into a ndarray
.
Actually, I am getting an MemoryError
when using this function against my file:
def iter_loadtxt(filename, delimiter=None, skiprows=0, dtype=float):
def iter_func():
with open(filename, 'r') as infile:
for _ in range(skiprows):
next(infile)
for line in infile:
line = line.rstrip().split(delimiter)
for item in line:
yield dtype(item)
iter_loadtxt.rowlength = len(line)
data = np.fromiter(iter_func(), dtype=[('',np.float),('',np.float),('',np.float),('',np.int),('',np.int),('',np.int),('',np.int)])
data = data.reshape((-1, iter_loadtxt.rowlength))
return data
data = iter_loadtxt(fname,skiprows=1)
I am now trying to input different dtypes in the call to np.fromiter
, in the hope that if most of my columns are integers and not floats I will have luck enough to avoid the Memory issue, but I had no success so far.
My file is "many rows" X 7 cols, and I'd like to specify the following formats: float
for the first three cols, and uint
for the following. My OS is Windows 10 64bit, and I have 8GB of RAM. I am using python 2.7 32bit.
My try was (following this answer):
data = np.fromiter(iter_func(), dtype=[('',np.float),('',np.float),('',np.float),('',np.int),('',np.int),('',np.int),('',np.int)])
but I receive TypeError: expected a readable buffer object
EDIT1
Thanks to hpaulj who provided the solution. Below is the working code.
def iter_loadtxt(filename, delimiter=None, skiprows=0, dtype=float):
def iter_func():
dtypes = [float, float, float, int, int, int, int]
with open(filename, 'r') as infile:
for _ in range(skiprows):
next(infile)
for line in infile:
line = line.rstrip().split(delimiter)
values = [t(v) for t, v in zip(dtypes, line)]
yield tuple(values)
iter_loadtxt.rowlength = len(line)
data = np.fromiter(iter_func(), dtype=[('',np.float),('',np.float),('',np.float),('',np.int),('',np.int),('',np.int),('',np.int)])
return data
data = iter_loadtxt(fname,skiprows=1)