3

So far I am doing something like this:

def is_utf8(s):
    try:
        x=bytes(s,'utf-8').decode('utf-8', 'strict')
        print(x)
        return 1
    except:
        return 0

the only problem is that I don't want it to print anything, I want to delete the print(x) and when I do that, the function stops functioning correctly. For example if I do : print(is_utf8("H�tst")) while the print is in the function it returns 0 otherwise it prints 1. Am i approaching the problem in a wrong way

Xantium
  • 11,201
  • 10
  • 62
  • 89
E. Meinl
  • 41
  • 1
  • 4
  • 1
    Possible duplicate of [How to check if a string in Python is in ASCII?](https://stackoverflow.com/questions/196345/how-to-check-if-a-string-in-python-is-in-ascii) – Azsgy Mar 25 '18 at 19:20
  • This question is a bit confused. If you want to check if a string is utf8-encoded, then there's no need to print the string. What printing the string does is that it throws an error if your terminal's character set can't handle one of the characters in the string. So the result of your little function actually depends on the user's terminal settings. What you probably _really_ want to do is to find out if there are any non-ascii characters in the string. – Aran-Fey Mar 25 '18 at 19:34
  • What is `sys.stdout.encoding`? – tdelaney Mar 25 '18 at 19:37
  • Oh, and what is `s`? If its a python string, then its always utf-8 encodable. – tdelaney Mar 25 '18 at 19:38

1 Answers1

4

You could use the chardet module to detect an unknown encoding. For example if a is a byte array then you could determine the encoding like this:

import chardet

b = chardet.detect(a)
print(b["encoding"])
Xantium
  • 11,201
  • 10
  • 62
  • 89