IDNA Encode Adding Apostrophes and letter B?

Question

I am using the IDNA library to encode/decode unicide domain names but when I encode a domain name, it adds apostrophes either side of the string and prepends the letter b?

For example:

import idna
print(idna.encode('español.com'))

Output: b'xn--espaol-zwa.com'

Expected output: xn--espaol-zwa.com

I feel like I'm missing something really obvious but not sure how to get to the bottom of this.

My expected output is reinforced by the fact if I decode it:

print(idna.decode('xn--espaol-zwa.com'))

I get the original domain: español.com

You can even see this in the example on the [homepage](https://pypi.org/project/idna/) — Barmar, May 05 '23 at 23:36
If you want to convert the byte string to a string, see https://stackoverflow.com/questions/606191/convert-bytes-to-a-string — Barmar, May 05 '23 at 23:37
[This question and answer](https://stackoverflow.com/q/6224052/fnord) explain what you're seeing. — arnt, May 06 '23 at 01:40
@Barmar, thanks for the pointers - I just expected encode/decode to only input/output the same format so it was throwing me, especially as I'm used to the PHP IDNA function which just outputs character strings! Much appreciated. — Mr Fett, May 06 '23 at 07:00

score 0 · Accepted Answer · answered May 06 '23 at 07:05

For any newbies like me looking for a simple solution to this, as @Barmer has pointed out, the IDNA package outputs a byte string even if you feed in a character string.

If you want a string, you can chain UTF-8 decoding thus:

idna.encode('español.com').decode('utf-8')

Outputs a character string of : xn--espaol-zwa.com

idna.decode will correctly decode this back to español.com without any further treatment needed.

IDNA Encode Adding Apostrophes and letter B?

1 Answers1