0

I have a bytestring, i want to process each bytes in the bytestring. One of the way to do it is to use map(), however due to this absurd problem Why do I get an int when I index bytes? accessing bytestring by index will cause it to convert to integer (and there is no way to prevent this conversion), and so map will pass each bytes as integer instead of bytes. For example consider the following code

def test_function(input):
  print(type(input))
before = b'\x00\x10\x00\x00\x07\x80\x00\x03'
print("After with map")
after_with_map = list(map(test_function, before[:]))
print("After without map")
for i in range(len(before)):
  test_function(before[i:i+1])

After with map will print

<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>

After without map will print

<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>

Is there any way to force map() to pass bytes as bytes and not as integer?

LLL
  • 229
  • 2
  • 13
  • no, because there's no `byte` type in python. Only byte*s* – gog Oct 03 '22 at 14:50
  • @gog ...... edited the post to replace all byte with bytes :\ ....... – LLL Oct 03 '22 at 14:52
  • Can you be a little more clear about why you care about the actual type? A byte is an integer that happens to fit in one byte of storage. But all of the operations that you want to do should just be integer operations. (Unless you want to operate on the hexadecimal representation?) – Andrew Jaffe Oct 03 '22 at 14:54
  • 2
    when you print `before[i:i+1]` it isn't selecting a single byte, it's a slice of a bytes object of length 1 – John M. Oct 03 '22 at 14:56
  • I think you misunderstand @gog's point: it's not that you should have `bytes` without map, it's that there is no such thing as a single `byte`. The "without map" output is showing a set of `bytes` sequences each of which has length 1. – Andrew Jaffe Oct 03 '22 at 14:58
  • @AndrewJaffe I suspect OP's point is that's not how regular strings work. `map` on those does indeed result in strings of length 1, so even though there's no "char" (or "rune", to use Gospeak) type they don't get converted to `int`s or anything else, just sliced up. – Mark Reed Oct 03 '22 at 15:06

3 Answers3

0

What is the goal here? The problem, if it can be called that, is that there is no byte type. That is, there is no type in Python to represent a single byte, only a collection of bytes. Probably because the smallest python values are all multiple bytes in size.

This is a difference between bytestrings and regular strings; when you map a string, you get strings of length 1, while when you map a bytestring, you get ints instead. You could probably make an argument that Python should do the same thing for strings (mapping to Unicode codepoints) for consistency, but regardless, mapping a bytes doesn't get you byteses. But since you know that, it should be easy to work around?

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
0

The least ugly solution I can come up with is an eager solution that unpacks using the struct module, for which the c code is one of the few things in Python that natively converts to/from length 1 bytes objects:

import struct

before = b'...'

# Returns tuple of len 1 bytes
before_as_len1_bytes = struct.unpack(f'{len(before)}c', before)

A few more solutions with varying levels of ugliness that I came up with first before settling on struct.unpack as the cleanest:

  1. Decode the bytes using latin-1 (the 1-1 bytes to str encoding that directly maps each byte value to the equivalent Unicode ordinal), then map that and encode each length 1 str back to a length 1 bytes:

    from operator import methodcaller  # At top of file
    
    before = b'...'
    # Makes iterator of length 1 bytes objects
    before_as_len1_bytes = map(methodcaller('encode', 'latin-1'), before.decode('latin-1'))
    
  2. Use re.findall to quickly convert from bytes to list of length 1 bytes:

    import re  # At top of file
    
    before = b'...'
    
    # Makes list of length 1 bytes objects
    before_as_len1_bytes = re.findall(rb'.', before)
    
  3. Use a couple map invocations to construct slice objects then use them to index as you do manually in your loop:

    # Iterator of length 1 bytes objects
    before_as_len1_bytes = map(before.__getitem__, map(slice, range(len(before)), range(1, len(before) + 1)))
    
  4. Use struct.iter_unpack in one of a few ways which can then be used to reconstruct bytes objects:

    import struct                    # At top of file for both approaches
    from operator import itemgetter  # At top of file for second approach
    
    # Iterator of length 1 bytes objects
    before_as_len1_bytes = map(bytes, struct.iter_unpack('B', before))
    
    # Or a similar solution that makes the bytes directly inside tuples that must be unpacked
    before_as_len1_bytes = map(itemgetter(0), struct.iter_unpack('c', before))
    

In practice, you probably don't need to do this, and should not, but those are some of the available options.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
0

I don't think there's any way to keep the map from seeing integers instead of bytes. But you can easily convert back to bytes when you're done.

after_with_map = bytes(map(test_function, before[:]))
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622