I'd like to find a way to determine if a Unicode character exists in a standardized subset of Unicode characters, specifically Latin basic and Latin-1. I am using Python 2 and the unicodedata module but need a solution that works in 3 as well because my job will be upgrading soon.
My current thinking is to use the Unicode Scripts.txt file and parse it into some kind of dictionary to search through. The problem is that the format of the Unicode codes in that file are like this.
02B9..02C1
and Unicode points in python are like this
`u'\xe6'
I do not know how I'd go about comparing these two things. I guess it's hexadecimal, and Python's representation is just another way of representing hexadecimal.
Are there any existing JSON data sets of Unicode subsets and their characters I can reference? Googling has turned up nothing. Would it be best to just make one from the Wikipedia page since the dataset is relatively small?