0

As per the title I need help with writing to a specific byte in a dump file. So far I'm able to read 512 byte with the following code :

sectorcount = 0;
bytecount= 0;
with open('a2.dump', 'rb') as f:
    for chunk in iter(lambda: f.read(16), b''):
        #16 bytes per chunk aka 32 characters
        item = chunk.encode('hex')
        #to filter display output so it shows 2 character per array element
        filtered_item= [item[i:i+2] for i in range(0, len(item), 2)]
        #to display in "hex" form
        #filtered_item[0] = "E5"


        print ' '.join(filtered_item)
        sectorcount = sectorcount +1
        #to display 1 sector use the value 32. adjust accordingly"
        if sectorcount ==32:
            break

The result shown were

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 77 8a 1c 22 00 00 00 21
03 00 83 37 ee fb 00 08 00 00 00 b8 3d 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa

As you can see I would need help in editing one of those values in the results (e.g. changing the value of "77" to maybe "E1")

I tried opening the file as with open('a2.dump', 'wb') as f: but my dump file got nulled. I believe i need to use the write operation to the file but unsure how to do it in Hex aka binary form in Python.

Appreciate any help in advance ! Thanks !

EDIT: As per James Sebastian request the I created a .dump file and edited them in HexEdit with my results shown above.

I then execute the code print repr(open('input.dump', 'rb').read()) Results as shown are:

'\x00w\x8a\x1c"\x00'

The corresponding expected output (the result after the replacements):

'\x00\xe1\x8a\x1c"\x00'
jfs
  • 399,953
  • 195
  • 994
  • 1,670
misctp asdas
  • 973
  • 4
  • 13
  • 35
  • I compared this results and it is similar in HexEdit. In my understanding 1 character in the result is 1 byte. so '00' in my results is 2 bytes – misctp asdas Nov 15 '15 at 01:52
  • I have understood the need to use 'ab+' because i need to append which is why 'wb' got the file replaced and overwritten. Question now is how to write to each specific byte ? – misctp asdas Nov 15 '15 at 01:52
  • I need to edit the file contents. Refer to my results above assuming that you need to replace one of the pairs (e.g. replace "77" with "E1"). How do i write "E1" to replace "77" ? – misctp asdas Nov 15 '15 at 01:59
  • how does it not ? All i have to do is find the hexadecimal of "77" within the file and replace it with "E1". How do i actually replace that hexadecumal value with another value within the file ? Plain enough ? – misctp asdas Nov 15 '15 at 02:04
  • Why do you need to modify the original file? Why can't you read from the old file and write to a new file? Is the file huge? – PM 2Ring Nov 15 '15 at 02:07
  • Yes it is. As a matter of fact it is 2GB + – misctp asdas Nov 15 '15 at 02:09
  • Ok. In that case it makes a lot of sense to modify the file in-place. :) – PM 2Ring Nov 15 '15 at 02:11
  • @J.F.Sebastian memory error – misctp asdas Nov 15 '15 at 02:16
  • sorry used wrong input file. see edits – misctp asdas Nov 15 '15 at 02:21
  • As well as my code below, you may also like to see an example I wrote a few months ago of in-place modification of a text file: http://stackoverflow.com/a/32098399/4014959 – PM 2Ring Nov 15 '15 at 03:05

2 Answers2

1

Here's a short demo of doing hex search & replace in a binary file. I took a 32 byte excerpt of your data; here's its hex dump (produced using hd on Linux).

00000000  00 00 00 00 00 00 00 00  77 8a 1c 22 00 00 00 21  |........w.."...!|
00000010  03 00 83 37 ee fb 00 08  00 00 00 b8 3d 00 00 00  |...7........=...|
00000020

Here's the code:

fname = 'qdata'
with open(fname, 'r+b') as f:
    #save position of the start of the data block
    fprev = f.tell()
    stringdata = f.read(32)
    print stringdata.encode('hex')

    #replace the first occurence of \x77\x8a with \xe1\x8a
    newdata = stringdata.replace('\x77\x8a', '\xe1\x8a')
    print newdata.encode('hex')

    #rewind file to the start of the data block
    f.seek(fprev)
    f.write(newdata)

Note that file mode is 'r+b'. This lets us read the file and also modify it. If you open it with a w mode the file is truncated, i.e., its previous contents get wiped out, and the file size is reset to zero. If you open it in an a mode the file pointer is positioned at the end of the file to allow data to be appended.

Here's the output that the above code prints:

0000000000000000778a1c220000002103008337eefb0008000000b83d000000
0000000000000000e18a1c220000002103008337eefb0008000000b83d000000

We don't need to do those .encode('hex') andprint steps, they're purely informational, so we can see what the program's doing.

Here's the hexdump of the modified file:

00000000  00 00 00 00 00 00 00 00  e1 8a 1c 22 00 00 00 21  |..........."...!|
00000010  03 00 83 37 ee fb 00 08  00 00 00 b8 3d 00 00 00  |...7........=...|
00000020

In the above code I read the entire file contents into RAM; that's certainly not necessary, you can scan it block by block, or however you see fit. But you must do a file .seek() call in between file .read() and .write() operations.

Also, be very careful that you get the positioning correct. And don't accidentally write the wrong data length. It won't change the file length, but it can still make a mess of your file if your replacement data isn't the length you think it is.


Here's a function that modifies file data at a given offset. Because its action is potentially dangerous the function prompts the user to make sure that the correct data is being overwritten. In the test code I use the same 32 byte file as before, overwriting the 3 bytes '\x83\x37\xee' at offset 0x12.

def binedit(fname, offset, newdata):
    with open(fname, 'r+b') as f:
        #Show current contents
        f.seek(offset)
        stringdata = f.read(len(newdata))
        print 'Current data:'
        print '%08X: %s\n' % (offset, stringdata.encode('hex'))

        prompt = 'Replace with %s ? (y/N) ' % newdata.encode('hex')
        s = raw_input(prompt)
        if s != 'y':
            print 'Aborting'
            return

        #Replace data at offset with newdata
        f.seek(offset)
        f.write(newdata)


fname = 'qdata'
offset = 0x12
newdata = 'dead42'.decode('hex')
binedit(fname, offset, newdata)

output

Current data:
00000012: 8337ee

Replace with dead42 ? (y/N) y

The "before" and "after" hex dumps:

00000000  00 00 00 00 00 00 00 00  77 8a 1c 22 00 00 00 21  |........w.."...!|
00000010  03 00 83 37 ee fb 00 08  00 00 00 b8 3d 00 00 00  |...7........=...|
00000020

00000000  00 00 00 00 00 00 00 00  77 8a 1c 22 00 00 00 21  |........w.."...!|
00000010  03 00 de ad 42 fb 00 08  00 00 00 b8 3d 00 00 00  |....B.......=...|
00000020

Disclaimer: If you destroy valuable data using this code it's not my fault!

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • nice code. but is there a way where i can be specific about the positioning of my editing instead of searching for a pattern ? (e.g. i would like to edit at the position of \x01x003 which in this case is "83") – misctp asdas Nov 15 '15 at 05:52
  • @misctpasdas: Your position notation looks a little odd to me, but I'll add some new code to my answer. – PM 2Ring Nov 15 '15 at 14:20
-1

To replace a byte in a binary file, you don't need a hex dump e.g., to replace b'\x77' with b'\xE1':

#!/usr/bin/env python
import mmap
from contextlib import closing

with open('a2.dump', 'r+b') as file, \
     closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE)) as s:
    i = -1
    while 1:
        i = s.find(b'\x77', i+1)
        if i < 0: # not found
            break
        s[i] = b'\xE1'[0] # replace

It performs the replacements inplace. It works for arbitrary large files.

For example, if the input file is created using:

open('a2.dump','wb').write(b'\x00w\x8a\x1c"\x00')

then the output (after the 77 -> E1 replacement) is:

print(repr(open('a2.dump','rb').read()))
# -> b'\x00\xe1\x8a\x1c"\x00'

Notice that 0x77 byte is replaced with 0xE1.

See Python - How can I change bytes in a file.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • i need to operate on a no-dependencies basis where the script may be applied in any linux system with portability. as such i cant use this code as is has dependencies on other APIs – misctp asdas Nov 15 '15 at 06:06
  • @misctpasdas: have you tried to run it? It has no dependencies. `mmap` is in stdlib and it is available on Linux (among other platforms). – jfs Nov 15 '15 at 06:08
  • thanks. Is there any way i could edit the bytes based on location within the file instead of searching for a pattern ? – misctp asdas Nov 15 '15 at 06:34
  • @misctpasdas: yes. Look at `s[i]` in the code. `i` is the offset in bytes (an integer). – jfs Nov 15 '15 at 06:41
  • how about import closing ? is it part of stdlib as well ? – misctp asdas Nov 15 '15 at 06:46
  • btw this code didnt worked as my file was 2GB big and it came back with the error WindowsError: [Error 8] Not enough storage is available to process this command – misctp asdas Nov 15 '15 at 06:48
  • @misctpasdas: (1) it is part of stdlib. Type `help('contextlib')` in Python console. (2) `mmap` works with large files even they do not fit in memory. (I've just tried 50GB file and it works ok). If it doesn't work for you then create a separate question: provide the *exact* code that you use (a minimal code that shows the issue), provide the full traceback of the error, run `python -mplatform` and add it to your new question. There are workarounds if your OS doesn't allow to mmap the whole file at once (you could pass offset + length) – jfs Nov 15 '15 at 07:06