6

I'm doing some serial protocol stuff and want to implement a basic byte stuffing algorithm in python. I'm struggling to determine what is the most pythonic way to do it.

Byte stuffing is basically just replacing any "reserved" bytes with a pair composed of an escape byte and the original byte transformed in a reversible way (e.g. xor'ed).

So far, I've come with 5 different approaches, and each of them has something I don't like about it:

1 Via Generator

def stuff1(bits):
    for byte in bits:
        if byte in _EscapeCodes:
            yield PacketCode.Escape
            yield byte ^ 0xFF
        else:
            yield byte

This may be my favorite, but maybe just because I'm kind of fascinated by yield based generators. I worried that the generator would make it slow, but it's actually the second fastest of the bunch.

2 Simply bytes()

def stuff2(bits):
    result = bytes()
    for byte in bits:
        if byte in _EscapeCodes:
            result += bytes([PacketCode.Escape, byte ^ 0xFF])
        else:
            result += bytes([byte])
    return result

Constantly has to create single element arrays just to throw them out because I'm not aware of any "copy with one additional element" operation. It ties for the slowest of the bunch.

3 Use byte array()

def stuff3(bits):
    result = bytearray()
    for byte in bits:
        if byte in _EscapeCodes:
            result.append(PacketCode.Escape)
            result.append(byte ^ 0xFF)
        else:
            result.append(byte)
    return result

Seems better than the direct bytes() approach. Actually slower than the yield generator and can do one byte at a time (instead of needing intermediate 1 element collections). But it feels brutish. It's middle of the pack performance.

4 BytesIO()

def stuff4(bits):
    bio = BytesIO()
    for byte in bits:
        if byte in _EscapeCodes:
            bio.write(bytes([PacketCode.Escape, byte ^ 0xFF]))
        else:
            bio.write(bytes([byte]))
    return bio.getbuffer()

I like the stream based approach here. But it is annoying that there doesn't seem to be something like a write1 API that could just add 1 byte, so I have to make those intermediate bytes again. If there was a "write single byte", I'd like this one. It ties for slowest.

5 Use replace()

def stuff5(bits):
    escapeStuffed = bytes(bits).replace(bytes([PacketCode.Escape]), bytes([PacketCode.Escape, PacketCode.Escape ^ 0xFF]))
    stopStuffed = escapeStuffed.replace(bytes([PacketCode.Stop]), bytes([PacketCode.Escape, PacketCode.Stop ^ 0xFF]))
    return stopStuffed.replace(bytes([PacketCode.Start]), bytes([PacketCode.Escape, PacketCode.Start ^ 0xFF]))

This is the fastest. But I don't like the way the code reads and the intermediate sweeps.

I tried additionally to use translate(), but AFAICT, it can only translate 1:1 sequences.

Community
  • 1
  • 1
Travis Griggs
  • 21,522
  • 19
  • 91
  • 167
  • why not look at an actual package: https://pypi.org/project/cobs/ – cowbert Apr 16 '18 at 17:52
  • COBS is a particular kind of byte stuffing. I'm not trying to do that one though. – Travis Griggs Apr 16 '18 at 18:16
  • What 'particular kind' of stuffing does it do that doesn't help your application? – Sunny Patel Apr 16 '18 at 18:41
  • ByteStuffing in general just means a way of transforming "special" or escaped bytes so that reserved bytes can be used unambiguously to communicate packet framing boundaries. The simple way is to use an escape byte. This the kind I'm implementing. COBS is a "clever" trick that uses a fixed recurring block header to weave positions of byte code zero throughout the chunk. – Travis Griggs Apr 16 '18 at 18:56
  • @TravisGriggs right, but since COBS is a special case of your problem, might as well take a look at how they implemented the byte manipulation in general. The COBS module is MIT licensed so there should be no IP problems using that code either. – cowbert Apr 16 '18 at 19:49
  • @cowbert COBS is pretty different from the standard byte stuffing (escaping) mechanism. It really is an alternate to the traditional byte stuffing algorithms. – Travis Griggs Apr 19 '18 at 00:40
  • I did end up looking at COBS anyway, and it's implemented in C anyway. The python code is just transliterated from the C approach which is good old fashion for loops. Nothing too "pythonic" there. – Travis Griggs Apr 19 '18 at 00:41
  • a slightly refactored version of 5, for each code in _EscapeCodes, do replace – greedy52 Apr 19 '18 at 00:56

0 Answers0