1

I want to learn Python, but currently I am stuck. My goal is to read in a file and then compare 8 bytes of that file with some other 8 bytes.I read the whole file in memory and now I want to iterate over the object and do the comparison check in 8 byte chunks as an exercise.

This is my code:

with open("read.file", 'rb') as f:
    read_file = f.read()

i = 0
while (i <= len(read_file)):
    chunk = read_file[i:i+8]
    print(sys.getsizeof(chunk))
    i += 8

I know I could just read 8 bytes in the first loop and do the comparison there, but I am interested if there is a solution to this. when running the code, sys.getsizeof(chunk) returns 41 bytes. Has anyone an idea what I might have overlooked?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Steve
  • 21
  • 3
  • `sys.getsizeof` doesn't do what you think it does. You need `len` again. – user2357112 Oct 14 '21 at 14:44
  • Hello! Are you asking how to [idiomatically iterate over chunks of a sequence](https://stackoverflow.com/q/434287/11082165), or are you asking why `sys.getsizeof(chunk)` is 41 bytes instead of some other unstated value? Note by the way that `sys.getsizeof(bytes())` is 33 bytes on my system, so 41 bytes for storing a sequence of 8 bytes is entirely reasonable. – Brian61354270 Oct 14 '21 at 14:44
  • @user2357112supportsMonica you are right, i misused ```sys.getsizeof``` @Brian i thought that ```sys.getsizeof``` will give me the size of of that object and that this will be 8 bytes big since i loaded 8 bytes into it, which is not the case thanks to both of you – Steve Oct 14 '21 at 16:21

1 Answers1

0

Say you have a byte array:

buf = b'irregular_length_byte_array' # len(buf) == 27
CHUNK_SZ = 4

Or in your case, a file that you read into a byte array

from io import BytesIO
with open(filepath, "rb") as f:
    buf = BytesIO(f.read()).getbuffer().tobytes()

You could break it into chunks like this:

[buf[i*CHUNK_SZ:(i+1)*CHUNK_SZ] for i in range(int(len(buf)/CHUNK_SZ)+1)]

How I'd use it in practice:

chunk_fn = lambda b,sz:[b[i*sz:(i+1)*sz] for i in range(int(len(b)/sz)+1)]    
chunk_fn(BUF,CHUNK_SZ)

Which results in:

[b'irre', b'gula', b'r_le', b'ngth', b'_byt', b'e_ar', b'ray']
darkpbj
  • 2,892
  • 4
  • 22
  • 32