0

I'm trying to make a file carver for .jpeg images in Python but unfortunately i'm finding it a lot harder than i thought it would be.

As far as i can tell, the problem i'm facing is caused by multiple SOI and EOI markers in an image for the thumbnails.

I need a way to separate the thumbnails and actual images but considering the EOI is just FFD9, i'm finding that quite hard.

My code:

with open(r'\\.\X:', 'rb') as f:

    startfile = 0
    size = 0

    start = '\xFF\xD8\xFF\xE0'
    end = '\xFF\xD9'

    chunksize = 512
    chunk = f.read(chunksize)

    while chunk:

        s = chunk.find(start)
        e = chunk.find(end)

        if s >= 0: startfile = f.tell() - chunksize + s
        if e >= 0: size = f.tell() - chunksize + e + 2

        if startfile and size:
            eof = size-startfile
            images.append((startfile, eof))
            startfile = size = 0

        chunk = f.read(chunksize)

    for pos, item in enumerate(images):
        with open(str(pos)+'.jpg', 'wb') as o:
            f.seek(item[0])
            o.write(f.read(item[1])
Leinad177
  • 51
  • 1
  • 7
  • Might [this question](http://stackoverflow.com/questions/4585527/detect-eof-for-jpg-images) help? I don't know the ins and outs of the JPEG format but it looks like you might want to search for multiple occurrences of SOI and EOI and parse them as if you were parsing, say, paren pairs in a simple grammar like Lisp. – 2rs2ts Feb 25 '14 at 14:40
  • I already looked at the question and it covers my problem in one of the comments, but it doesn't provide a solution. As far as simply searching for multiple occurrences, i'm reading raw from the drive which means that i have to search the whole drive to find SOI and EOI. Which means it's quite hard to match them up. – Leinad177 Feb 25 '14 at 14:48
  • can you keep track in your code if you come across another SOI marker if your 'stream' is open, if it is, then start another file to write to until you get to the next EOI marker (which will give you the thumbnail) and then resume the first output file until you hit the next EOI. –  Jun 07 '14 at 13:59
  • @m0atz I believe that's a similar method to how [Adroit](http://digital-assembly.com/products/adroit-photo-forensics/) handles it. It finds numerous possible endings then uses a statistical model (no doubt trained using thousands of carved images) to determine the most likely footer. Personally, I'd love to see a carver that allows me to dynamically change the footer to other 'possible' ones identified by the program to see the various outputted images. The program must of course deal with fragmentation and intelligently handle allocated data vs. file slack. – Bob Dylan Jan 12 '16 at 16:11

0 Answers0