0

I am using the IDNA library to encode/decode unicide domain names but when I encode a domain name, it adds apostrophes either side of the string and prepends the letter b?

For example:

import idna
print(idna.encode('español.com'))

Output: b'xn--espaol-zwa.com'

Expected output: xn--espaol-zwa.com

I feel like I'm missing something really obvious but not sure how to get to the bottom of this.

My expected output is reinforced by the fact if I decode it:

print(idna.decode('xn--espaol-zwa.com'))

I get the original domain: español.com

Barmar
  • 741,623
  • 53
  • 500
  • 612
Mr Fett
  • 7,979
  • 5
  • 20
  • 21
  • 1
    `encode()` returns a byte string, not a character string. – Barmar May 05 '23 at 23:35
  • You can even see this in the example on the [homepage](https://pypi.org/project/idna/) – Barmar May 05 '23 at 23:36
  • If you want to convert the byte string to a string, see https://stackoverflow.com/questions/606191/convert-bytes-to-a-string – Barmar May 05 '23 at 23:37
  • 1
    [This question and answer](https://stackoverflow.com/q/6224052/fnord) explain what you're seeing. – arnt May 06 '23 at 01:40
  • @Barmar, thanks for the pointers - I just expected encode/decode to only input/output the same format so it was throwing me, especially as I'm used to the PHP IDNA function which just outputs character strings! Much appreciated. – Mr Fett May 06 '23 at 07:00

1 Answers1

0

For any newbies like me looking for a simple solution to this, as @Barmer has pointed out, the IDNA package outputs a byte string even if you feed in a character string.

If you want a string, you can chain UTF-8 decoding thus:

idna.encode('español.com').decode('utf-8')

Outputs a character string of : xn--espaol-zwa.com

idna.decode will correctly decode this back to español.com without any further treatment needed.

Mr Fett
  • 7,979
  • 5
  • 20
  • 21