Reading unknown-encoded text files in Adobe AIR

Question

I've got an Adobe AIR app where users can process text files from their filesystems. I've been assuming UTF-8, but in the case that a text file is in some other encoding (eg iso-8859-1 or iso-2022-kr), how should I determine (best guess) the encoding type of the text file so I can read the contents into a String?

FileStream.readMultiByte supports an intimidating range of types.

I could try File.systemCharset but there's no guarantee the file was created by the machine running my app. If I could first be sure the file wasn't UTF-8 that might be an acceptable fallback. — Sarah Northway, Jan 06 '16 at 01:27
Here's a discussion on PHP's mb_detect_encoding, but AS3 has nothing of the sort (official or user-written that I can find) http://php.net/manual/en/function.mb-detect-encoding.php — Sarah Northway, Jan 07 '16 at 18:28

score 0 · Accepted Answer · edited May 23 '17 at 12:10

0

You can try to guess by looking if file has header, like BOM. But you will never be 100% sure.

Look at another answers:

How to detect the encoding of a file?

How can I detect the encoding/codepage of a text file

EDIT: Maybe this would be guessing approach, it is not in AS3 but it could help: Simple class to automatically detect text file encoding, with English-biased "best guess" heuristic based on byte patterns in the absence of BOM.

Also, if it is an option in you app, you can use UTF-8 and let users preview text in another encoding of their choice.

edited May 23 '17 at 12:10

Community

1
1

answered Jan 06 '16 at 15:19

Nemi

1,012
10
19

I'll try to convert and use the C# class you linked to determine UTF-8/16/32/BOM/no-BOM then fallback to File.systemCharset and let the user pick encoding from a list. Great suggestions - thanks! – Sarah Northway Jan 07 '16 at 18:31

Reading unknown-encoded text files in Adobe AIR

1 Answers1