1

Ok, so I am in a bit of a weird parsing scenario, but here it goes.

I have a script that reads in the bytes it needs to parse. I need to parse out those bytes and then return them.

Example

-------------------------------------------------------------------
Description: Log Parameters   : Byte Offset:  0
-------------------------------------------------------------------
-------------------------------------------------------------------
Description: Offset           : Byte Offset:  2-1
-------------------------------------------------------------------
-------------------------------------------------------------------
Description: Request Count    : Byte Offset:  3
-------------------------------------------------------------------
-------------------------------------------------------------------
Description: Reserved         : Byte Offset:  127-4
-------------------------------------------------------------------

So my script will eventually have the ability to output the hex associated with each line. For now, I need to say, ok, Byte offset is 0, go get the first byte and return it in hex. Ok, byte offset is 127-4, go get that, print the hex value right there on the screen.

The format is 127 bytes of hex stored in a string.

HEX String

100000000000000220000000000000003000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
0000000000000000

The 0x prefix has been stripped and stored into a string.

There are a lot of zeroes in this example, but this is just a random case. The byte offsets can fluctuate, so I'm trying to figure out how to basically iterate through an array of byte offsets and parse them incrementally.

It's weird to me that if a description takes up so many bytes, bitwise operations become more difficult because I can't split these up into 32 or even 64 bit blocks.

What I Want

Currently I have an array of the byte offsets in the following form:

[0, 2-1, 3, 127-4]

I want to iterate through each of those byte offsets, parse them from long hex string and print them.

Question

How do I use the byte offsets from my array and parse them out of the hex string.

Community
  • 1
  • 1
Ryan
  • 354
  • 3
  • 18
  • So your question is how to parse the list of byte offsets? Or how to use those to get the correct bytes from the string of hexadecimal bytes? – brenns10 Jul 15 '15 at 20:27
  • How are you storing the range? `2-1` is a subtraction in Python. – brenns10 Jul 15 '15 at 20:30
  • Their docs are weird, but it's actually 2-1 (2 and 1 included) – Ryan Jul 15 '15 at 20:32
  • @Ryan, StackOverflow is a question-and-answer site. Readers, such as yourself, ask questions. Other readers attempt to answer them. Your post is well-organized, and you've clearly put some effort into it. However, it is still missing a key ingredient: a question. Do you have a specific question to ask? – Robᵩ Jul 15 '15 at 20:33
  • Sorry, was trying my best to structure this best as I could. – Ryan Jul 15 '15 at 20:36

2 Answers2

1

Say that you have the starting byte # stored in start variable, and ending byte # stored in end variable, and then the hex string stored in string variable.

Since every byte is two hexadecimal digits, you can simply do this to get the byte in hexadecimal string form:

string[start*2:(end+1)*2]

You need to do end+1 because it appears that your byte ranges are inclusive in your example, but Python slicing is exclusive on the end of the range. More on slicing if you're unfamiliar.

To make this concrete for you, here is a minimal working example. You may have to do parsing and massaging to get your ranges to look like mine, but this is the idea:

string = "100000000000000220000000000000003000000000000000" \
         "000000000000000000000000000000000000000000000000" \
         "000000000000000000000000000000000000000000000000" \
         "000000000000000000000000000000000000000000000000" \
         "000000000000000000000000000000000000000000000000" \
         "0000000000000000"

ranges = ['0', '2-1', '3', '127-4']

for offset in ranges:
    offset_list = offset.split('-')
    if len(offset_list) == 1:
        start = int(offset_list[0])
        end = int(offset_list[0])
    else:
        start = int(offset_list[1])
        end = int(offset_list[0])
    the_bytes = string[start*2:(end+1)*2]
    print('%d-%d: %s' % (start, end, the_bytes))

Output:

0-0: 10
1-2: 0000
3-3: 00
4-127: 00000002200000000000000030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
brenns10
  • 3,109
  • 3
  • 22
  • 24
  • I updated my answer with a more concrete example that you can directly run. Hopefully that'll help you a bit more! – brenns10 Jul 15 '15 at 20:41
  • Know the easiest way to split a string of 127-4 into a comma-sep. list like that? nvm, got it – Ryan Jul 15 '15 at 20:50
  • 1
    @Ryan `"127-4".split("-")` – Robᵩ Jul 15 '15 at 20:57
  • Hm, I am overcomplicating this. I am looking through each byte offset in a loop. As it iterates, I want to append '127-4' into the form [(4,127)] so the final array looks like the ranges example above – Ryan Jul 15 '15 at 21:07
  • Also slight problem in your example output. Bits are read in reverse order, so shouldn't I reverse the string? – Ryan Jul 15 '15 at 21:08
  • See my updated code, which deals with the ranges being strings, as you seem to be implying. Also, if you are saying that the bytes are numbered from 127 down to 0, then yes, you should reverse `string` before doing this. – brenns10 Jul 15 '15 at 21:12
  • Hm, I noticed a slight issue. The bits are printing in reverse order LSB - MSB when it should be MSB - LSB. Can this be fixed? I think it's ok to structure "the_bytes" as the_bytes[::-1] – Ryan Jul 15 '15 at 22:16
  • That is probably caused by the fact that you reversed the string. Since each byte is a pair of digits (say, AB), when you reversed the string, the digits were reversed as well (AB becomes BA). So, you should actually reverse the string in units of pairs. – brenns10 Jul 15 '15 at 23:23
  • Here is the shortest way I could come up with: `string = "".join(x for pair in reversed(list(zip(string[::2], string[1::2]))) for x in pair)` – brenns10 Jul 15 '15 at 23:24
0
# Input: array of byte values
x='''
100000000000000220000000000000003000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000
0000000000000000
'''

# Input: list of offsets
o = ['0', '2-1', '3', '127-4']

# Put everything in a more useful format
x = ''.join(x.split())
o = [item.split('-') for item in o]
o = [[int(item) for item in pair] for pair in o]
for pair in o:
    if len(pair) == 1:
        pair.append(pair[0])

# Display the values
for pair in o:
    print pair, x[pair[1]*2:pair[0]*2+2]
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • my output from this... slightly off maybe? – Ryan Jul 15 '15 at 20:52
  • [0, 0] 0x [2, 1] 0000 [3, 3] 00 [127, 4] 0000000000 – Ryan Jul 15 '15 at 20:53
  • Your input string apparently starts with `0x`, but the example input string in your question doesn't. If your actual data has an `0x`, or if your actual data is an `int` instead of a string, please update your question. – Robᵩ Jul 15 '15 at 20:55
  • Hm, the string I am dealing with is the block I have up there labeled HEX String. It's a string. – Ryan Jul 15 '15 at 21:04