0

I am having a hard time to load a textfile into a stringlist in firemonkey on osx when the encoding of the textfile in not known.

When I just use list.loadfromfile(filename), I get most of the time an exception regarding encoding.

list.loadfromfile(filename,TEncoding.unicode) will also fail when the file is in ansi, and opposite.

There is no issue on Windows, list.loadfromfile(filename) just works, but not on osx.

I cant specify the encoding, because it will be unknown (user provide the text files).

Any clue how I can get around this encoding issue when running the app on a mac?

  • Try to detect the character encoding first. – Ilyes Jan 20 '17 at 12:54
  • To detect the encoding, try `file -I filename`. See [How do I determine file encoding in OSX?](http://stackoverflow.com/q/539294/576719). – LU RD Jan 20 '17 at 12:57
  • If this does not work, read [How can I detect the encoding/codepage of a text file](http://stackoverflow.com/a/90956/576719). – LU RD Jan 20 '17 at 13:01
  • Note that when `LoadFromFile()` is called without a `TEncoding`, it checks the file for a BOM, and if not detected than falls back to `TEncoding.Default`, which on Windows is the user's current Ansi codepage, but is UTF-8 on other platforms. – Remy Lebeau Jan 20 '17 at 18:59

1 Answers1

2

In general this is not possible. It is quite possible to create a single file that is valid when interpreted in all common encodings. This has been discussed many times, for instance: The Notepad file encoding problem, redux.

I'm assuming that you are working with files that do not contain byte order marks, BOMs. Obviously if your input files contained BOMs then you could simply check the BOM and be done.

With that assumption stated, the right solution to the problem, in a perfect world, is to know the encoding. Either pick a specific encoding which your program requires, or arrange for the user to tell you the encoding when they supply the file.

If, for whatever reason, you cannot do that then the next best thing to do is to use heuristics to attempt to guess the encoding used. I'm not aware of any Pascal code to do this. But you should be able to put something together that will work reasonably well. This answer gives an outline of a basic strategy: https://stackoverflow.com/a/20747074

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490