I want to test if a string of bytes that I'm extracting from a file results in valid ISO-8859-15 encoded text. The first thing I came across is this similar case about UTF-8 validation:
https://stackoverflow.com/a/5259160/1209004
So based on that, I thought I was being clever by doing something similar for ISO-8859-15. See the following demo code:
#! /usr/bin/env python
#
def isValidISO885915(bytes):
# Test if bytes result in valid ISO-8859-15
try:
bytes.decode('iso-8859-15', 'strict')
return(True)
except UnicodeDecodeError:
return(False)
def main():
# Test bytes (byte x95 is not defined in ISO-8859-15!)
bytes = b'\x4A\x70\x79\x6C\x79\x7A\x65\x72\x20\x64\x95\x6D\x6F\xFF'
isValidLatin = isValidISO885915(bytes)
print(isValidLatin)
main()
However, running this returns True, even though x95 is not a valid code point in ISO-8859-15! Am I overlooking something really obvious here? (BTW I tried this with Python 2.7.4 and 3.3, results are identical in both cases).