1

Is there a way to try to decode a bytearray without raising an error if the encoding fails?

EDIT: The solution needn't use bytearray.decode(...). Anything library (preferably standard) that does the job would be great.

Note: I don't want to ignore errors, (which I could do using bytearray.decode(errors='ignore')). I also don't want an exception to be raised. Preferably, I would like the function to return None, for example.

my_bytearray = bytearray('', encoding='utf-8')

# ...
# Read some stream of bytes into my_bytearray.
# ...

text = my_bytearray.decode()

If my_bytearray doesn't contain valid UTF-8 text, the last line will raise an error.

Question: Is there a way to perform the validation but without raising an error?

(I realize that raising an error is considered "pythonic". Let's assume this is undesirable for some or other good reason.)

I don't want to use a try-catch block because this code gets called thousands of times and I don't want my IDE to stop every time this exception is raised (whereas I do want it to pause on other errors).

Eric McLachlan
  • 3,132
  • 2
  • 25
  • 37
  • Are you familiar with [`try, except`](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) blocks? Is that what you're trying to avoid? If so, could you please update your question with that and the reason for not using it? – Hampus Larsson Jul 30 '20 at 10:13
  • @HampusLarsson: I have updated my question to include my reasoning. – Eric McLachlan Jul 30 '20 at 10:44

2 Answers2

6

You could use the suppress context manager to suppress the exception and have slightly prettier code than with try/except/pass:

import contextlib
...
return_val = None
with contextlib.suppress(UnicodeDecodeError):
    return_val = my_bytearray.decode('utf-8')
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Oh! I like that! Thanks, @snakecharmerb. I'll leave the question open for a day or so in case there's an even better answer; otherwise, I'll accept yours as _the_ answer. – Eric McLachlan Jul 30 '20 at 10:46
  • Oh dang. My IDE still pauses when the exception is thrown. So, yeah, this stops the error from being reported but NOT from being thrown, unfortunately. – Eric McLachlan Jul 30 '20 at 10:52
  • 3
    Hmmm, I don't think there's a way to prevent the exception being thrown by decode. Tbh if the issue is the IDE pausing execution I'd see if the IDE could be reconfigured (or run the code outside the IDE). – snakecharmerb Jul 30 '20 at 10:55
  • 1
    Or you could parse the bytes to see if they form a valid UTF-8 sequence, but that would be slower than decoding. – snakecharmerb Jul 30 '20 at 10:56
  • Thank you. I've certainly considered writing my own UTF-8 decoder. I was hoping for another library, method, or parameter that could have the same effect but perhaps there is none? ¯\\_(ツ)_/¯ – Eric McLachlan Jul 30 '20 at 11:00
1

The chardet module can be used to detect the encoding of a bytearray before calling bytearray.decode(...).

The Code:

import chardet
identity = chardet.detect(my_bytearray)

The method chardet.detect(...) returns a dictionary with the following format:

{
  'confidence': 0.99,
  'encoding': 'ascii',
  'language': ''
}

One could check analysis['encoding'] to confirm that my_bytearray is compatible with an expected set of text encoding before calling my_bytearray.decode().

One consideration of using this approach is that the encoding indicated by the analysis might indicate one of a number of equivalent encodings. In this case, for instance, the analysis indicates that the encoding is ASCII whereas it could equivalently be UTF-8.

(Credit to @simon who pointed this out on StackOverflow here.)

Eric McLachlan
  • 3,132
  • 2
  • 25
  • 37