0

Migrating code to Python3.6, unpacking and assigning to a list worked in Python2.6 as the whole list was a string, in 3.6 string values are represented as bytecode. Any value that was an integer is being represented correctly in the list, but any string fields are being represented as bytes still eg: b'B'

Source data is a binary file containing various messages, with various lengths, these messages are successfully being unpacked and stored in a list

Raw byte values data of a sample message

b'\x07\x88g\xe0b\xe5]\xc5\x00\x01j\xdd\x00\x01\xff\xdcB\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x03\xe8\x00\x00\x02'
Unpacked data - using '>I Q I c I Q i H B' on the raw byte values above
[126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]

End state: to implement a generic solution that will detect any b' in a list (can be any index in a list depending on message) convert to a normal string value.

or do not store string values as bytecode during the unpack

Current :  [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]
End state: [126380000, 7126205086073711325, 131036, B, 1, 10, 1000, 0, 2]

Noting b'B' is to be simply represented as B

I have searched google and stackoverflow for a answer, but only find generic decode examples.

Thanks in advance

Faiqa Saleem
  • 143
  • 5
Kopl
  • 3
  • 2

2 Answers2

0

AFAIK, there is no format character for struct.unpack that outputs a string, always in bytes.

You can use map to decode each bytes-type list item to a string.

org = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]
res = list(map(lambda i: i.decode("utf-8") if isinstance(i, bytes) else i, org))

EDIT

As suggested, it can be simpler to use a list comprehension instead of map.

res = [i.decode("utf-8") if isinstance(i, bytes) else i for i in org]

I recommend going through the discussions in List comprehension vs map to see when to use one over the other (ex. performance with long/large lists, readability, with/without lambdas, etc.).

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • 1
    Isn't simpler `res = [i.decode("utf-8") if isinstance(i, bytes) else i for i in org]`? – 6502 Apr 24 '19 at 05:55
  • @6502 Ahh yes, using list comprehension looks indeed simpler. I've edited it in, and added a link to a related post on list comps vs. maps. – Gino Mempin Apr 24 '19 at 06:10
  • Excellent! Thank you very much. I will profile both versions. I tried to upvote, I don't have enough kudos for it to be public :) – Kopl Apr 24 '19 at 07:09
0

map

        mysetup = "fields = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]"
        mycode = 'fields = list(map(lambda i: i.decode("utf-8") if isinstance(i, bytes) else i, fields))'
        print(timeit.timeit(setup=mysetup, stmt=mycode, number=100000))

Time: 0.24705234917252444

List comprehension

        mysetup = "fields = [126380000, 7126205086073711325, 131036, b'B', 1, 10, 1000, 0, 2]"
        mycode = 'fields = [i.decode("utf-8") if isinstance(i, bytes) else i for i in fields]'
        print(timeit.timeit(setup=mysetup, stmt=mycode, number=100000))

Time: 0.1520654000212543

List comprehension is faster.

Kopl
  • 3
  • 2