0

I'm trying to split a very large text file(~8Gb) into small files of 1Gb each. But when I try to get the chunks using zip_longest it throws Memory Error. It throws error as mentioned below :

Traceback (most recent call last):
File "splitFile.py", line 125, in <module>
    splitFileIntoChunk(lineNumFileChunk, baseDirectoryPath, inputFile);
File "splitFile.py", line 63, in splitFileIntoChunk
    for i, g in enumerate(group):
MemoryError

The same code works perfectly fine for 800Mb file.

Also the 8Gb file is successfully converted to chunks of files with size 512Mb each. But shows error to create chunks of files with 1Gb size.

Is this something related to maximum allocation of RAM to python compiler. If so how can we change that?

For the record my system has 16gb of RAM and i7 processor. The RAM is also not totally consumed as I've checked it through Resource monitor, it uses around 2-4 Gb of RAM.

Here is some code for refence :

def grouper(n, inputFileInterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(inputFileInterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)  

And then :

def  splitFileIntoChunk(lineCountChunkSize, absoluteFilePath, inputFileName):

    _fileName, _fileExtention = os.path.splitext(inputFileName);

    if not os.path.exists(absoluteFilePath + outputDirectory):
        os.makedirs(absoluteFilePath + outputDirectory)

    with open(absoluteFilePath + inputFileName) as inputFileObject:
        group = grouper(lineCountChunkSize, inputFileObject, fillvalue=None);
        for i, g in enumerate(group):
            with tempfile.NamedTemporaryFile('w', delete=False) as fout:
                for j, line in enumerate(g, 1):  # count number of lines in group
                    if line is None:
                        j -= 1  # don't count this line
                        break
                    fout.write(line)
            os.rename(fout.name, (absoluteFilePath + outputDirectory + _fileName + '_{0}' + _fileExtention).format(i))

For more refernce to the code refer : Splitting large text file into smaller text files by line numbers using Python

Version: Python 3.4
OS: Windows 7
Architecture : 64-bit

Community
  • 1
  • 1
HVT7
  • 709
  • 4
  • 13
  • 23

0 Answers0