I'm trying to split a very large text file(~8Gb) into small files of 1Gb each. But when I try to get the chunks using zip_longest it throws Memory Error. It throws error as mentioned below :
Traceback (most recent call last):
File "splitFile.py", line 125, in <module>
splitFileIntoChunk(lineNumFileChunk, baseDirectoryPath, inputFile);
File "splitFile.py", line 63, in splitFileIntoChunk
for i, g in enumerate(group):
MemoryError
The same code works perfectly fine for 800Mb file.
Also the 8Gb file is successfully converted to chunks of files with size 512Mb each. But shows error to create chunks of files with 1Gb size.
Is this something related to maximum allocation of RAM to python compiler. If so how can we change that?
For the record my system has 16gb of RAM and i7 processor. The RAM is also not totally consumed as I've checked it through Resource monitor, it uses around 2-4 Gb of RAM.
Here is some code for refence :
def grouper(n, inputFileInterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(inputFileInterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
And then :
def splitFileIntoChunk(lineCountChunkSize, absoluteFilePath, inputFileName):
_fileName, _fileExtention = os.path.splitext(inputFileName);
if not os.path.exists(absoluteFilePath + outputDirectory):
os.makedirs(absoluteFilePath + outputDirectory)
with open(absoluteFilePath + inputFileName) as inputFileObject:
group = grouper(lineCountChunkSize, inputFileObject, fillvalue=None);
for i, g in enumerate(group):
with tempfile.NamedTemporaryFile('w', delete=False) as fout:
for j, line in enumerate(g, 1): # count number of lines in group
if line is None:
j -= 1 # don't count this line
break
fout.write(line)
os.rename(fout.name, (absoluteFilePath + outputDirectory + _fileName + '_{0}' + _fileExtention).format(i))
For more refernce to the code refer : Splitting large text file into smaller text files by line numbers using Python
Version: Python 3.4
OS: Windows 7
Architecture : 64-bit