Python getting specific bytes in a large hex number

Question

Suppose I have a large hex number, 0x1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF

and I wanted to get bytes 10-20, from this hex number easily, how would I accomplish that? I know I can cut the data from bit shifting right by 10*8 times but I still have the significant bytes in my left over hex number.

It's just a string of hex digits? https://stackoverflow.com/q/663171 — Robert Harvey, Aug 03 '18 at 18:05
It's simple mathematics: `index = (pos*2)` with base pos = 1. So first byte is `pos=1` and hence `index=2` in a zero-based array. — zx485, Aug 03 '18 at 18:05
You could use a bit mask to AND the bit shifted value to only get the bits you need. — Karl, Aug 03 '18 at 18:09
but my bit mask will be huge @Karl it would be something like 0xFFFFFFFFFFFFFFFFFFFF. bytes 10-20 is just an example, I'm wanting to d othis with around 40 bytes — jimmyhuang0904, Aug 03 '18 at 18:39
@RobertHarvey no sorry, it's not actually a string. Its just a huge number (python supports arbitrarily huge numbers) so I can do operations on it as if it was a number — jimmyhuang0904, Aug 03 '18 at 18:40

Pavlo Pravdiukov · Accepted Answer · 2018-08-04T23:53:38.067

The easiest way is to use string slices. Since the lowest byte is on the far right and the highest is on the left, we can utilise negative indexes.

def sub_bytes(i, start=0, end=0):
    i_str = hex(i)[2:]  # skip 0x part
    i_sub = i_str[-end * 2: len(i_str) - start * 2]  # get the bytes we need
    return int(i_sub or '0', 16)  # convert to back int

len(i_str) is here for better start=0 handling

Let's try with your value

In [2]: value = 0x1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF

In [3]: sub_bytes(value, 0, 3)
Out[3]: 11259375

In [4]: hex(sub_bytes(value, 0, 3))
Out[4]: '0xabcdef'

In [6]: hex(sub_bytes(value, 10, 20))
Out[6]: '0x90abcdef1234567890ab'

In [7]: hex(sub_bytes(value, 45))
Out[7]: '0x123456'

If a requested slice is empty or out of range I return 0x0 here, but you may raise IndexError if you like.

UPDATE

In Python 3.2+ there are to_bytes and from_bytes defined on int that are more efficient and more handy for this case

import math

def slice_bytes(value, a=None, b=None, byteorder='little'):
    size = math.ceil(value.bit_length() / 8)
    value_bytes = value.to_bytes(size, byteorder)
    return int.from_bytes(value_bytes[a: b], byteorder)

And after some performance testing on a number 7 ** 7 ** 7 which has 288998 bytes, I found slice_bytes to be faster than the direct approach by Karl. sub_bytes is apparently slower.

Thanks! This works for now, I wanted to see if there was any more efficient ways from using bit manipulation but I guess not :) — jimmyhuang0904, Aug 03 '18 at 20:10
How would you actually raise the IndexError? right now you're returning 0 by default in your return statement — jimmyhuang0904, Aug 03 '18 at 20:43
I would write `if not i_sub: raise IndexError("no bytes in the slice")` or something like that right before the return statement — Pavlo Pravdiukov, Aug 04 '18 at 21:36

score 0 · Answer 2 · answered Aug 03 '18 at 23:20

Instead of messing with strings and substrings I feel like the bit mask approach is a more direct approach to getting the bits you need. In your comment you mentioned the bit mask would be very big and that is true, but that is not an issue for a program.

I have an example function which can make a mask for you depending on how many bytes you want to get from the data. Then you simply AND that mask with the right shifted value to get the value you want.

Say you want to get 4 bytes of data starting from byte index 2:

def get_bytes(value, start, amount):  
    shifted_value = value >> (start * 8) # Multiply by 8 for how much to right shift
    mask = make_mask(amount) 
    return shifted_value & mask

def make_mask(byte_amount):
    if byte_amount > 0:
        bin_string = '1' * (byte_amount * 8)  # Create binary string mask
    else:
        bin_string = '0'  # Make result 0
    return int(bin_string, 2)  # Return integer representation

value = 0x1234567890ABCDEF1234567890ABCDEF
result = get_bytes(value, 2, 4)

The result ends up being 1450741931 in the resulting decimal integer which translates to 0x567890ab in hex.

Python getting specific bytes in a large hex number

2 Answers2