0

I've got an Adobe AIR app where users can process text files from their filesystems. I've been assuming UTF-8, but in the case that a text file is in some other encoding (eg iso-8859-1 or iso-2022-kr), how should I determine (best guess) the encoding type of the text file so I can read the contents into a String?

FileStream.readMultiByte supports an intimidating range of types.

Sarah Northway
  • 1,029
  • 1
  • 14
  • 24
  • I could try File.systemCharset but there's no guarantee the file was created by the machine running my app. If I could first be sure the file wasn't UTF-8 that might be an acceptable fallback. – Sarah Northway Jan 06 '16 at 01:27
  • Here's a discussion on PHP's mb_detect_encoding, but AS3 has nothing of the sort (official or user-written that I can find) http://php.net/manual/en/function.mb-detect-encoding.php – Sarah Northway Jan 07 '16 at 18:28

1 Answers1

0

You can try to guess by looking if file has header, like BOM. But you will never be 100% sure.

Look at another answers:

How to detect the encoding of a file?

How can I detect the encoding/codepage of a text file

EDIT: Maybe this would be guessing approach, it is not in AS3 but it could help: Simple class to automatically detect text file encoding, with English-biased "best guess" heuristic based on byte patterns in the absence of BOM.

Also, if it is an option in you app, you can use UTF-8 and let users preview text in another encoding of their choice.

Community
  • 1
  • 1
Nemi
  • 1,012
  • 10
  • 19
  • I'll try to convert and use the C# class you linked to determine UTF-8/16/32/BOM/no-BOM then fallback to File.systemCharset and let the user pick encoding from a list. Great suggestions - thanks! – Sarah Northway Jan 07 '16 at 18:31