1

I am exporting a file via a http get response, using ASP.NET Web API. For that, I am returning a FileContentResult object, as in:

return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");

After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's action, and the file is being download just ok.

Well, not exactly. I originally wanted it to be sent/downloaded as a .csv file. If I set the http request content-type to "text/csv" and the File() call sets the response's content type to "text/csv" just as well, Advanced REST Client will show the contents properly, but excel will open it as gibberish data.

If I simply change the content-type to "text/plain", save it as a .txt file (have to rename it after saving, don't know why it is being saved as _.text-plain, while as a csv it is being saved with .csv extension), and finally perform an import in Excel like described here Excel Import Text Wizard, then then excel opens the file correctly.

Why is the .csv being opened as gibberish, while as a .txt it is not ? For opening a .csv, there is no import wizard like with a .txt file (not that I am aware of).

Providing a bit of the source below:

StringBuilder fileContents = new StringBuilder();

//csv header
fileContents.AppendLine(String.Join(CultureInfo.CurrentCulture.TextInfo.ListSeparator, fileData.Select(fileRecord => fileRecord.Name)));

//csv records
foreach (ExportFileField fileField in fileData)
    fileContents.AppendLine(fileField.Value);

return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8");

As requested, the binary contents of both files. The text-plain (.txt) version (the one that will open in excel, using import):

and the .csv one (the one that excel will open with junk data):

The (files are the same, the cropping of the screen shots was not the same...)

Veverke
  • 9,208
  • 4
  • 51
  • 95
  • What is `fileContents` here? And what do you mean by "geebrish"? What *exactly* is the content of the file? (Don't open it in Excel - open it in a binary file editor or something that will show you the exact bytes.) – Jon Skeet May 12 '15 at 15:14
  • @JonSkeet: a StringBuilder instance. – Veverke May 12 '15 at 15:14
  • @JonSkeet: I just learned that Visual Studio offers a binary editor. I am adding screenshots with the file *binary* contents, in both text-plain (.txt) and .csv formats. The data looks the same. – Veverke May 12 '15 at 15:33
  • Right, so it sounds like the file contents are fine - it's just that Excel won't open the CSV file as you expect it to. If that's the case, that's not really a C# issue... (If you build the same file manually, in Notepad, presumably you'll see the same result...) – Jon Skeet May 12 '15 at 15:34
  • @JonSkeet: this means I can find a solution in another forum (SuperUser) ? Or there is not likely a solution ? – Veverke May 12 '15 at 15:38
  • The content you've shown doesn't look like decent text, unless is really is meant to be entirely non-ascii apart from the commas... – Jon Skeet May 12 '15 at 15:49
  • @JonSkeet: the content represents text in Hebrew – Veverke May 12 '15 at 15:50
  • Okay, that makes sense then. Presumably you can explicitly import the .CSV file into Excel too? – Jon Skeet May 12 '15 at 15:51
  • @JonSkeet: that's my problem (should have added this to the question text) - was I able to do so, I would be ok. But I am not. Unless there is a way which I do not know. – Veverke May 12 '15 at 15:51
  • So if you open Excel and go through the *exact same steps* (including specifying the encoding, if possible) as you do for the .txt file, but for the .csv file, you get a different result? It does sound like this should be in Superuser... – Jon Skeet May 12 '15 at 16:05
  • @JonSkeet: I will move it to SuperUser. Yes, that's what's happens (there is no way to set the encoding when opening a .csv, as far as I know) – Veverke May 12 '15 at 16:14
  • Looking at it from a different angle - is there a good reason for exporting the data as a .csv file at all? If it is intended that the exported file be opened in Excel you could use a free library such as EPPlus to export a .xlsx file which should open in Excel ok without having to explicitly import text data, set encoding, etc. – Stewart_R May 12 '15 at 17:53
  • @Stewart_R: thanks for the insight, but I do not think the client wants the user to worry about having this or that software installed. – Veverke May 13 '15 at 09:26
  • No worries - just a thought. For clarity though, the client wouldn't need to have the software installed. EPPlus is a nuget package you would reference in your web API project. Your web API would then return the .xlsx file rather than a .csv – Stewart_R May 13 '15 at 09:30

3 Answers3

1

To my surprise, trying to perform the request via a browser instead of using google's Advanced REST Client, clicking on the the file that is downloaded just works! Excel opens it correctly. So the problem must be with ARC.

In any case, since the process is not going to be done using an http client other than a browser... my problem is gone. Again, in ARC's output screen the file is displayed correctly. I do not know why upon clicking it to be opened in Excel it "gets corrupted".

Strange.

Veverke
  • 9,208
  • 4
  • 51
  • 95
  • It probably *didn't* get corrupted. Excel *imports* the text data in both cases. When you *import* you get to select the file's codepage. When you double-click, Excel will use your regional settings to decide what encoding, column and decimal separator to use. Eg. for countries that use `,` as the decimal separator, `;` is used as the column separator. This always results in trouble when opening csvs. Either that, or your client saved the file as ASCII instead of UTF8, or it removed the BOM that would tell Excel this is a UTF8 file – Panagiotis Kanavos May 14 '15 at 11:31
  • 1
    I was able to repro - the problem *is* caused by the regional settings. – Panagiotis Kanavos May 14 '15 at 11:41
  • writing it right now, it's not that simple, nor is it a bug. – Panagiotis Kanavos May 14 '15 at 11:46
  • @PanagiotisKanavos: I do not think "it removed the BOM" is what happened, since in any case I am returning the file with `Encoding.UTF8.GetPreamble().Concat(byteData).ToArray()`, which, according to [this](http://stackoverflow.com/questions/4414088/how-to-getbytes-in-c-sharp-with-utf8-encoding-with-bom) post, should fix such problem. – Veverke May 14 '15 at 11:48
1

I was able to reproduce the issue by saving a file containing Greek characters with BOM. Double clicking attempts to import the file using the system's locale (Greek). When manually importing, Excel detects the codepage and offers to use the 65001 (UTF8) codepage.

This behavior is strange but not a bug. Text files contain no indication that would help detect their codepage, nor is it possible to guess. An ASCII file containing only A-Z characters saved as 1252 is identical to one saved using 1253. That's why Windows uses the system codepage, which is the local used for all non-Unicode programs and files.

When you double click on a text file, Excel can't ask you for the correct encoding - this could get tedious very quickly. Instead, it opens the file using your regional settings and the system codepage. ASCII files created on your machine are saved using your system's codepage so this behaviour is logical. Files given to you by non-programmers will probably be saved using your country's codepage as well. Programmers typically switch everything to US English and that's how problems start. Your REST client may have saved the text as ASCII using the Latin encoding used by most programmers.

When you import the text file to an empty sheet though, Excel can ask you what to do. It tries to detect the codepage by checking for a BOM or a codepage that may be matching the file's contents and presents the guess in the import dialog box, together with a preview. The decimal and column separators are still those provided by your regional settings (can't guess those). UTF8 is generally easy to guess - the file starts with a BOM or contains NUL entries.

ASCII codepages are harder though. Saving my Greek file as ASCII results in a Japanese guess. That's English humour for you I guess.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
0

The binary contents of the file show a correctly utf-8 encoded CSV file with hebrew characters. If,a s you state in the comments, Excel does not allow you to change it's guessed file encoding when opening a CSV file, that is rather a misbehavior in Excel itself (call it a bug if you want).

Your options are: use LibreOffice (http://www.libreoffice.org/) which spreadsheet component does allow you to customize the settings for opening a CSV file.

Another one is to write a small program to explicitely convert your file to the encoding excel is expecting - if you have a Python3 interpreter installed, you could for example type:

python -c "open('correct.csv', 'wt', encoding='cp1255').write(open('utf8.csv', encoding='utf8').read())"

However, if your default Windows encoding is not cp1255 for handling Hebrew, as I suppose above, that won't help excel, but to give you different gibberish :-) In that case, you should resort to use programs that can correctly deal with different encodings.

(NB. there is a Python call to return the default system encoding in Windows, but I forgot which it is, and it is not easily googleable)

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • As Occam's Razor would suggest, before assuming no Israeli user has ever opened a csv file using Excel, try to find a simpler explanation. In this case, the OP was using an intermediate tool. In other cases, Excel uses the user's regional settings to open a file, resulting in problems for eg. French, German, Greek locales users where `,` is the decimal separator and `;` the column separator. Actually, I *dropped* Libre a couple of years ago because it couldn't handle Unicode as well as Word did at the time (polytonic Greek characters, that is) – Panagiotis Kanavos May 14 '15 at 11:27