1

I am trying to remove a byte (\x00\x81) from a byte string sByte.

sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

I am expecting to have as a result the following:

sByte = b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

I have tried the following:

  1. I tried to decode sByte; after running the below line of code,

    sByte.decode('utf-8')
    

    I received a traceback: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte.

  2. I also tried the following, but did not work:

    sByte.replace('\x00\x81', '')
    
  3. I also found this:
    json - UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte) but it did not help removing \x00\x81.

I am not sure how we can remove or replace a byte in byte string.

martineau
  • 119,623
  • 25
  • 170
  • 301
Joe
  • 575
  • 6
  • 24

2 Answers2

3
>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte[2:]
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

See also https://appdividend.com/2022/07/09/python-slice-notation/

The code snippet returns sByte from and including the third byte until the end.

If you wanted to store the variable again you could do this:

>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte = sByte[2:]
>>> sByte
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
S1LV3R
  • 146
  • 3
  • 15
2

bytes.replace doesn't work in-place, it returns a modified copy of the bytes object. You can use sByte = sByte.replace(b'\x00\x81', b'') (or bytes.removeprefix if the bytes always occur at the start). Depending on your circumstances, you can also set the errors parameter of the decode method to 'ignore': sByte = sByte.decode(encoding='utf-8', errors='ignore').

isaactfa
  • 5,461
  • 1
  • 10
  • 24
  • Just as an "addition" to this answer, if OP wants to have a more "inclusive" substitute, they can always check out `re.sub`. For this specific answer it would be: `re.sub(b"\x00\x81", b'', sByte)` – Michael S. Aug 03 '22 at 18:33
  • @MichaelS. How does `re` behave differently in this case? – isaactfa Aug 03 '22 at 18:36
  • In this case, it does not, and `replace` is the better method. But if OP has other bytes that they want to remove with similar structures, then creating one `re.sub` that matches the format of all those other bytes could save them from doing a unique `replace` for each different byte. Again, in this case, your method is better. Just wanted OP to know about other options – Michael S. Aug 03 '22 at 18:40