1

Please consider the following problem.

I'm writing a quick Manipulate[] program to display a ton of information, but am running into problem with the unicode. Here is what I currently have as input and output:

Manipulate[
 request = filenumber <> "*";
 filenames = FileNames[request];
 display = Import[type, "List"];
 Short[display, 25]
 , {filenumber, "001", InputField}, {type, filenames, PopupMenu}]

enter image description here

The problem is that the French-language accents are showing up oddly. The quick workaround I thought of was to change my code to Import[type,"Plaintext"]; which works, but then displays the information in list form, like so:

enter image description here

What would you suggest as a way to get the clarity of the second example with the straightforward list format of the former? So that it wraps on the line rather than having a line break after each entry.

As an aside - probably just as important as the actual question itself - could anybody explain the rationale behind why importing as a "List" distorts the unicode? I've had a lot of trouble working around this, and understanding the underlying behaviour might help me move forward quicker.

canadian_scholar
  • 1,315
  • 12
  • 26
  • 2
    Did you try `Import[filename, "List", CharacterEncoding -> "UTF8"]` – Sasha Dec 02 '11 at 02:39
  • @Sasha This is great. `CharacterEncoding -> "UTF8"` will be widely used by me. Do you want to put that in an answer so I can upvote it? – canadian_scholar Dec 02 '11 at 02:43
  • @Sasha I've had this encoding problem and have been doing workarounds that could have been pretty quickly avoided. I figured it was intrinsic to the type of Import, but I guess not. :) – canadian_scholar Dec 02 '11 at 02:45
  • Sorry to all for what appeared to be a simple question: I'd just never heard of the `CharacterEncoding` command before. Very glad I asked! – canadian_scholar Dec 02 '11 at 03:12
  • Ian, I asked a similar question a few days ago when playing with your datafile, and also linked to it in a comment on your original question: http://stackoverflow.com/questions/8254429/reading-utf-8-text-files-with-readlist The answer shows you how to fix encodings in already imported strings, and I mention the `CharacterEncoding` option of `Import` in the question itself. – Szabolcs Dec 02 '11 at 09:53
  • @Szabolcs How embarrassing, esp as I had read that post (esp curious about timings). I probably shouldn't admit this, but I was so tunnel focused on seeing it as an artifact of `Import`ing as `Plaintext` or `List` and not as an encoding issue. Now that I get it, I immediately see how useful `ToCharacterCode` is as well. But I just hadn't connected the dots in my head.. – canadian_scholar Dec 02 '11 at 13:12
  • @Ian [It happens to all of us](http://stackoverflow.com/questions/8326258/install-mathlink-program-with-arbitrary-path-environment), but I wanted to have that question linked from here anyway (it appears in the sidebar now). – Szabolcs Dec 02 '11 at 13:37

1 Answers1

6

Although Import does not have options associated with itself, it takes options relevant to the format being imported. Specifically see the Options section of ref/Format/List for the list of options.

In the case at hand, you can indicate the file encoding with CharacterEncoding->"UTF8":

Import[filename, "List", CharacterEncoding -> "UTF8"]
Sasha
  • 5,935
  • 1
  • 25
  • 33
  • As noted above, this is great. I don't have a file handy that has been giving me trouble, but the `CharacterEncoding` docs mention `Put` and `Get` - is this command versatile enough to work on `Export[]` as well? – canadian_scholar Dec 02 '11 at 02:51
  • @HarmonicesMundi UTF encoding accommodate (almost) all the characters in existence, so it should handle it. The list of available and supported encodings can be accessed via `$CharacterEncodings`. From that list I would try `"WindowsEastEurope"` or `"UTF8"`. – Sasha Dec 02 '11 at 13:53
  • @HarmonicesMundi It seems like you are using v7. I believe this problem is fixed in v8. Anyhow, you better report your issue to support at wolfram dot com. – Sasha Dec 05 '11 at 14:20