0

I'm new to python and I'm editing a program where I need to open I file but it's more than 1.5 Gb so I get memory error. Code is:

f=open('thumbdata3.dat','rb')
tdata = f.read()
f.close()

ss = '\xff\xd8'
se = '\xff\xd9'

count = 0
start = 0
while True:
    x1 = tdata.find(ss,start)
    if x1 < 0:
        break
    x2 = tdata.find(se,x1)
    jpg = tdata[x1:x2+1]
    count += 1
    fname = 'extracted%d03.jpg' % (count)
    fw = open(fname,'wb')
    fw.write(jpg)
    fw.close()
    start = x2+2

So I get an

MemoryError

in

tdata = f.read()

section. How do I modify a function to split a file while being read?

ᾯᾯᾯ
  • 23
  • 5
  • you may find something useful here: https://stackoverflow.com/questions/1035340/reading-binary-file-and-looping-over-each-byte – Corley Brigman May 14 '18 at 14:52
  • Could you give us the full error message? I'm sure there was text after `MemoryError` – Flimm May 14 '18 at 14:53
  • Split the file in what way? – martineau May 14 '18 at 14:58
  • Since you want to take a chunk out of the file, you might want to look at using https://docs.python.org/3.5/library/mmap.html. You can probably use [rfind](https://docs.python.org/3.5/library/mmap.html#mmap.mmap.rfind) to find the border/delimiters you're looking for, and slice notation on the mmap itself once you have the indexes you want. – Tom Dalton May 14 '18 at 15:11
  • When I run program from the IDLE I get `Traceback (most recent call last): File "...\prog.py", line 6, in tdata = f.read() MemoryError` – ᾯᾯᾯ May 14 '18 at 15:14

1 Answers1

0

From the description it seems that the memory footprint is the problem here. So we can use the generators to reduce the memory footprint of the data , so that it loads the part of data being used one by one.

from itertools import chain, islice

def piecewise(iterable, n):
    "piecewise(Python,2) => Py th on"
    iterable = iter(iterable)
    while True:
        yield chain([next(iterable)], islice(iterable, n-1))

l = ...
file_large = 'large_file.txt'
with open(file_large) as bigfile:
   for i, lines in enumerate(piecewise(bigfile, l)):
      file_split = '{}.{}'.format(file_large, i)
      with open(file_split, 'w') as f:
         f.writelines(lines)
Axecalever
  • 69
  • 5