2

I’m writing bytes 9, 10, and 13 to a file in binary mode, but they display as \t, \n, \r. I don’t understand why.

# opening file in binary write mode 
with open("C://Users//lenovo//Desktop//sample.txt",'bw') as mm:
    for i in range(17):
        mm.write(bytes([i]))


#opening file in binary read mode
with open("C://Users//lenovo//Desktop//sample.txt",'br') as mp:
    for i in mp:
        print(i)

output:-

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n'
b'\x0b\x0c\r\x0e\x0f\x10'
Tom Zych
  • 13,329
  • 9
  • 36
  • 53
  • 4
    because python substitutes known char letters for its hex-representation: \n == \x0a , \t == \x09 etc when you print them out - see https://www.asciitable.com/. Open the file in a hex-editor and you can indeed see there is 0x10 in it ... – Patrick Artner Jul 29 '18 at 11:26
  • Possible duplicate of [What's the correct way to convert bytes to a hex string in Python 3?](https://stackoverflow.com/questions/6624453/whats-the-correct-way-to-convert-bytes-to-a-hex-string-in-python-3) ... can not find the dupe where I read the \t == \0x09 stuff on printout someage ago. – Patrick Artner Jul 29 '18 at 11:36

1 Answers1

1

Expanding on Patrick’s first comment:

You are, in fact, getting exactly the output you expected to get. The problem is, Python is not printing them the way you expect.

A tab character in ASCII is 0x09. A newline is 0x0a. A carriage return is 0x0d. So Python is printing them using the escaped characters it normally uses for those often-used characters: \t \n \r.

The other values you used, 0x00 and 0x01 and so forth, don’t have shortcuts like that, so Python prints them in hex as expected.

Observe, also, that you are reading one line at a time (for i in mp:), so when Python sees the newline, it stops reading the first line there, and the rest of the file ends up in a second line.

I will assume that you want to, first, print everything in hex, and second, print it in reasonably sized chunks per line, instead of breaking on newlines. Here’s some code to do that.

To read equal-sized chunks of the file, we just use the read method of the file object. I think it’s probably most readable to leave the 0x out, but insert spaces between the bytes. We can most easily do this using string formatting.

bytes_per_line = 8

with open(r'C:\Users\lenovo\Desktop\sample.txt', 'br') as mp:
    while True:
        b = mp.read(bytes_per_line)
        if not b:
            break
        fmt = ('{:02x} ' * len(b))[:-1]
        print(fmt.format(*b))

Notes:

  • r in front of filename makes it a raw string so we don’t have to double the backslashes
  • read method will read bytes_per_line (8) bytes at a time, except at EOF
  • Close to EOF, we may get less than 8 bytes, depending on file size
  • At EOF, we get an empty string and break out of the loop
  • fmt is something like '{:02x} {:02x} {:02x} {:02x}', one format specifier per byte
  • [:-1] removes the last character — read about slicing
  • We use fmt with the format method to apply it to *b
  • *b means to pass the bytes of b as individual arguments, corresponding to the format specifiers in fmt
Tom Zych
  • 13,329
  • 9
  • 36
  • 53
  • Judging by the OP's code snippets, he\she is not as experienced in python as you are. I'd recommend adding comments to help him\her understand. – addohm Jul 29 '18 at 22:11