0

I'm trying to print a list of bytes to a file in hex (or decimal, doesn't really make a difference for the sake of the question I believe). I'm baffled by how hard this appears to be in Python?

Based on what I found here (and on other SO questions), I already managed to do this:

>>> b
[12345, 6789, 7643, 1, 2]
>>> ' '.join(format(v, '02x') for v in b)
'3039 1a85 1ddb 01 02'

Now... this is already a good start, but... for larger numbers the individual bytes are grouped and attached to each other. This should not happen. There should be a space between them, just like is the case with the two last numbers.

Of course, I could just do some string manipulation afterwards and insert some spaces, but... that sounds like a very hacky way, and I refuse to believe there isn't a cleaner way to do this.

So, what I want is this:

'30 39 1a 85 1d db 00 01 00 02'

How can I do this?

Opifex
  • 362
  • 4
  • 15
  • 4
    Curious to know why you want to add space between the actual number. Will it won't become difficult to decode later or it won't make any difference in your case? – shaik moeed Aug 30 '23 at 15:36
  • @shaikmoeed The data originates from a source where the bytes are also spaced apart, despite them being 2-byte ints. The destination to where this data needs to be written to also requires them to be spaced apart. You could ask "Then why bother treating them to ints at all?". This is because some processing was done in between, that required them to become actual numbers with a correct value. Not separate bytes. – Opifex Aug 30 '23 at 15:38
  • Note that this in an embedded software context. This Python code will only be a development tool, but the output will be used in embedded software. It only accepts bytes. – Opifex Aug 30 '23 at 15:40
  • Have you checked this [SO answer](https://stackoverflow.com/a/25761534/8353711)? – shaik moeed Aug 30 '23 at 16:01
  • Should it really be `'... 01 02'` or rather `'... 00 01 00 02'`? – tobias_k Aug 30 '23 at 16:10
  • 1
    "So, what I want is this:" If you have this result, **why is it useful**? In particular: if you had the input `[48, 57, 26, 133, 29, 219, 1, 2]`, you would get the same result, yes? Or (assuming I didn't typo somewhere) `[3474837746526912770]`? How will you know which input resulted in the output? Or does it not matter? When encoding data, it's important to consider the protocol, not just the raw values. – Karl Knechtel Aug 30 '23 at 17:28
  • 1
    Aside from that: if "the output will be used in embedded software" that "only accepts bytes", then **why are these hex strings useful**? It sounds like you would be better served by directly creating the corresponding byte values, in a `bytes` object - that is, a value equivalent to `b'09\x1a\x85\x1d\xdb\x01\x02'`. Voted to close as needing clarity because the title asks about "a string of the individual bytes", but strings **do not store** bytes, and a string of the sort described **especially** is not doing so. – Karl Knechtel Aug 30 '23 at 17:30
  • 1
    The fact that you want `01` for `1` instead of `0001` is a big red flag. You're using a variable-length encoding with no way to determine value boundaries. There's no way to recover the original input. What you're doing is most likely going to lead to data corruption and data loss. – user2357112 Aug 30 '23 at 17:49
  • It *sounds* like what you **actually** want is just to write raw bytes to a file. In which case, you should open a file in binary mode, and just do `f.write(bytes(b))` – juanpa.arrivillaga Aug 30 '23 at 19:07
  • @shaikmoeed I hadn't. The only downside of that solution is that it's limited to 32-bit integers. Good thing that it already catches user2357112's problem, which is a thing in the example I gave in the question. – Opifex Aug 30 '23 at 20:15
  • @tobias_k Yes, it should be. My examples were fictitious and failed to take that corner case into account. – Opifex Aug 30 '23 at 20:16
  • @KarlKnechtel I said the *context* was embedded software. I never said this code was going to be used in embedded software. Your question could be answered in the comments, but it is not relevant for answered the question I asked. Your close vote is ridiculous. The 5 great answers given prove this. It's gatekeepers like you who are slowly killing StackOverflow. – Opifex Aug 30 '23 at 20:18
  • @user2357112 It is a red flag indeed. But only in the (flawed) example I gave in the Question. The real application does not use variable-length encoding. It's 32-bit fixed. (Even though the software is written to allow other lengths, in one stream the length is fixed) – Opifex Aug 30 '23 at 20:19
  • @juanpa.arrivillaga I'm opening a binary PGM file, performing some modifications on that file, and rewriting it in binary. No problems there. However, the contents of this modified PGM need to be used in somewhat human-readable format. For example, a C-byte-array. For this to work, I need to have the values byte-by-byte. It might not make sense to you, but this is 20 year old legacy code I'm working with. Some things can be changed, but most things can't. – Opifex Aug 30 '23 at 20:22
  • What is very conusing is that you *don't have a list of bytes*. You have a list of `int` objects. So, for example, you have the first int, `12345`, which apparently, is supposed to correspond to *two bytes*, byte 48 and byte 57. Is that correct?, IOW, the little endian, 16 bit unsigned integer bytes? – juanpa.arrivillaga Aug 30 '23 at 20:37
  • 1
    So, it *sounds* like you actually want something like: `[f'{val:02x}' for i in data for val in i.to_bytes(signed=False,byteorder='big', length=2)]`... perhaps filtering out 0 – juanpa.arrivillaga Aug 30 '23 at 20:42
  • 1
    Reading your comments, it sounds like in your actual use-case you want the bytes of a 32 bit, big endian unsinged int but I guess stripping leading zeros? EDIT: sorry, I was getting confused, you want to keep all zero bytes. I've voted to re-open, but basically, you just want the big-endian bytes of a given length encoding (in your example, 16 bit). This is definitely achievable in a 'principled" not hacky way – juanpa.arrivillaga Aug 30 '23 at 21:02
  • @juanpa.arrivillaga (@ comment 1): Correct. I have a list of integers. Converting to a list of bytes is one of the paths I tried to take, but it didn't get me to the solution I needed. (@ comment 4):That sounds like what I need indeed. Except for the stripping of leading zeros. As some other commenters already pointed out, that was an error in my fictitious example. In the real output, there should be leading zeroes. (@ comment 2): I believe that that could be the answer to this question! Would you mind moving the comment to an answer? It's more solid and cleaner than the other answers. – Opifex Aug 30 '23 at 21:27
  • yeah I have a function for you, just need to wait for one more re-open vote. Can you edit the answer to add this descriptioin? @KarlKnechtel can you re-open this? The OP has clarified – juanpa.arrivillaga Aug 30 '23 at 21:29
  • @juanpa.arrivillaga my concern wasn't only about the ambiguity of the data without padding, but about the output format. Describing a goal of PGM output *almost* makes sense, but PGM is supposed to use decimal values in text, not a hex dump. That said, being able to create a hex dump *does* have utility (for example, to populate a text field in a hex editor); so regardless of whether I think the answers *actually directly solve the motivating problem*, the underlying question is entirely valid. Reopened. – Karl Knechtel Aug 30 '23 at 21:46
  • The goal isn't to output PGM P2 (which is decimal ASCII). What I mentioned earlier was binary PGM, which is PGM P5. That part is already finished, and is not related to the question anymore. Creating a hex dump of the data extracted from the initial (binary) PGM is more or less what I need to do, yes. – Opifex Aug 30 '23 at 21:51
  • @Opifex You seem to have changed your requirements thereby rendering some of the answers that were correct now incorrect. You really need to think about what you actually want when posting a question – DarkKnight Aug 31 '23 at 06:29
  • I appologize @DarkKnight. The question was correct, but the example I gave had indeed the flaw that the bytes were variable size. Variable size bytes would indeed, as many commenters remarked, be of little use. After it was pointed out, I corrected the example. I left the rest of the question unchanged, despite getting the explicit demand to change it. That being said, I think your answer isn't wrong. All that needs to be done is remove the hack to remove the leading zeroes. You can even split the answer up, so it shows both possibilities. – Opifex Aug 31 '23 at 09:21

5 Answers5

2

There's probably a better (simpler) way but this seems to work:

def pair(s):
    for i in range(0, len(s), 2):
        yield s[i:i+2]

def ashex(n):
    yield from pair(f'0000{n:02x}'[-4:])

int_list = [12345, 6789, 7643, 1, 2]

r = []

for n in int_list:
    r.extend(list(ashex(n)))

print(' '.join(r))

Output:

30 39 1a 85 1d db 00 01 00 02
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
1

Okay, so this is a bit cobbled together but should do what you want:

from textwrap import wrap

int_list = [12345, 6789, 7643, 1, 2]  # here's your list of ints

# convert all the values to hex, and then join them together as a string,
# using slicing ([2:]) to strip off the '0x' hexadecimal prefix from each
# (zfill is used to pad smaller values with zeros as needed, e.g. 01 and 02)
byte_string = ''.join([hex(n)[2:].zfill(2) for n in int_list])
# => '30391a851ddb0102'

# use the 'wrap' method from the textwrap module to group the bytes into
# groups of 2, then 'join' the resulting list with spaces in between
new_bytes = ' '.join(wrap((byte_string), 2))
# => '30 39 1a 85 1d db 01 02'

It's probably worth mentioning that wrap is hardly the only way to group characters in a string (or elements of an iterable, for that matter). There are plenty of other options given in this answer if you're curious!

In case you're wondering why you have to bother with wrap (or something similar) rather than just joining the elements of the first list comprehension with whitespace, here's what happens:

byte_string = ' '.join([hex(n)[2:].zfill(2) for n in ll])
# => '3039 1a85 1ddb 01 02'

Close, but not quite right!

JRiggles
  • 4,847
  • 1
  • 12
  • 27
  • This works for OP's data but is not a general solution. If the hex form of an integer in the list is represented by more than 4 characters, the output is incorrect. For example, try replacing one of the values with 768955 (0xbbbbb) to see the result – DarkKnight Aug 30 '23 at 16:39
  • @DarkKnight that's a good catch! So I suppose this solution works for decimal values *up to* 65,535 (0xFFFF). Hmmm...any thoughts on how I could generalize this further? – JRiggles Aug 30 '23 at 16:45
1

So, you want to create a hexdump-like output. Given your example inputs/outputs, it sounds like you want the big-endian, 16-bit length bytes from the integers in your list. The key is to use int.to_bytes. Here is a function that first creates a list of bytes from the integers, then iterates over the individual bytes, formatting them as requested:

def hexdump(
    data: list[int], length=2, byteorder: str = "big", fmt: str = "02x", sep: str = " "
) -> str:
    bytes_data = (i.to_bytes(length=length, byteorder=byteorder) for i in data)
    return sep.join([f"{b:{fmt}}" for bs in bytes_data for b in bs])

Here are some examples of it in use:

In [2]: data = [12345, 6789, 7643, 1, 2]

In [3]: hexdump(data)
Out[3]: '30 39 1a 85 1d db 00 01 00 02'

In [4]: hexdump(data, length=4, fmt='04x')
Out[4]: '0000 0000 0030 0039 0000 0000 001a 0085 0000 0000 001d 00db 0000 0000 0000 0001 0000 0000 0000 0002'

In [5]: hexdump(data, sep="|")
Out[5]: '30|39|1a|85|1d|db|00|01|00|02'
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • 1
    This is indeed what I needed! (Even though the condensed format from your comment was already sufficient for my purposes) Will wait 24 hours to accept answer, as per SO etiquette. Have an upvote in the meantime! – Opifex Aug 30 '23 at 22:03
0

Another solution, using re:

import re

b = [768955, 12345, 6789, 7643, 1, 2]

out = " ".join(
    c
    for s in map("{:x}".format, b)
    for c in re.findall("..", f"{s:0>{len(s) + len(s) % 2}}")
)
print(out)

Prints:

0b bb bb 30 39 1a 85 1d db 01 02
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

You can take advantage of zip to give you the characters 2-at-a-time. A running list is extended with the pairs, joined by "space", and returned.

def converter(data:list):
    out = []
    #max number of hexadecimal places to the next full byte
    places = (max(data).bit_length()+7)//8*2
    
    for num in data:
        byt = f'{num:0{places}x}'
        
        #get characters in pairs
        pairs = zip(byt[::2], byt[1::2]) 
        
        #combine and store pairs
        out.extend(''.join(pair) for pair in pairs)      
            
    return ' '.join(out)
    
print(converter([12345, 6789, 7643, 1, 2])) #30 39 1a 85 1d db 00 01 00 02
OneMadGypsy
  • 4,640
  • 3
  • 10
  • 26