For example, I have the literal string,
00180012001a001b...
And I need to cast this literal string of characters to bytes, figure out what the right encoding is, and decode them.
There is no right answer here when it comes to the decoding format; it is something I'll have to figure out. The important thing is to start from what python has to offer out of the box.
Anyhow, there seems to be a natural partition of the characters into 32 bit binary units:
[ 'x0018', 'x0012', 'x0014', ... ] <- '...'
In some kind of reasonably straight forward and robust way. How does python3 handle this situation?
For context, I am pulling content from PDFs. The PDF mixes encodings and has an overall encoding of "ascii".
I start with a string as follows:
# original string; I've anonymized the actual bytes.
# the string is broken up with info related to PDFs
original = '[ <00010002> 1 <000300040005> 1.00708 <00060007> ] TJ'
parsed = '0001000200030004000500060007'
# obviously this won't work
bparsed = parsed.encode('ascii')
bparsed_encoding = chardet.detect(bparsed) # ascii
# what I want to see output is: WinAnsiEncoding
So the question is: how do I cast this string to its literal byte state without encoding it as an ascii-encoded string?
I've included more than enough information to answer the question I'm looking for the answer to. There is no need to dive into more context.