4

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).

The http response is written in asp.net C#

Response.Clear();
Response.Buffer = true;

Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");

Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel

Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;

NMDUtilities.Export oUtilities = new NMDUtilities.Export();

Response.Write(oUtilities.DataGridToCSV(gvExport, ","));

Response.End();

The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?


UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.

iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "

iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "

Here's my implementation of DataGridToCsv

public string DataGridToCsv(GridView input, string delimiter)
{
    StringBuilder sb = new StringBuilder();

//iterate Gridview and put row results in stringbuilder...
   string result = HttpUtility.HtmlDecode(sb.ToString());
   return result;
}


UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.

UPDATE3
Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.

UTF8
4D 61 74 65 (MATE from the first word MATERIAL)
UTF32
4D 00 00 00 (M from the first word MATERIAL)

So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.

Community
  • 1
  • 1
P.Brian.Mackey
  • 43,228
  • 68
  • 238
  • 348
  • @P.Brian I think that iPad use a similar to Google Chrome and Safari browser. Try to test on them on your PC to see the results. http://www.apple.com/safari/ – Aristos Jun 23 '11 at 16:37
  • @Aristos - Just ran it in Safari 5.0.5 for the desktop on windows XP and it works fine. I do not have a Mac, only iPad. – P.Brian.Mackey Jun 23 '11 at 16:39
  • what is the output of DataGridToCSV ? string ? stream ? byte[]? it's not enough to set the Response.ContentEncoding if the content's encoding itself is not corresponding – Steve B Jun 23 '11 at 18:59
  • Is that `Response.Write(string)` by chance?? –  Jun 23 '11 at 19:00
  • @Joel Mueller - UTF-16 Excel fails with various character display problems and iPad fails by displaying nothing. – P.Brian.Mackey Jun 23 '11 at 20:32
  • Excel has no way of knowing that your CSV file is UTF-8. The browser knows, but that information is lost when the data is saved to a file and opened by Excel. If you put the BOM (byte order mark) for UTF-8 `EF BB BF` at the beginning of the file, Excel may be able to figure it out. – Gabe Jun 23 '11 at 20:39
  • @Gabe - I dont quite understand what you mean. I am able to specify UTF32 and Excel somehow is able to perform properly, but it cannot when I specify UTF8. So the encoding information must be present in some form for the change in encoding to even make a difference. – P.Brian.Mackey Jun 23 '11 at 20:50
  • 1
    What are the first 4 bytes in your output stream when sending as UTF-8? How about when using UTF-32? – Gabe Jun 23 '11 at 20:55

2 Answers2

7

The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.

The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing

EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20

and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)

Try making sure your output starts with those first 3 bytes.


How to write a BOM in C#
    byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
    Response.BinaryWrite(BOM);//write the BOM first
    Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV
P.Brian.Mackey
  • 43,228
  • 68
  • 238
  • 348
Gabe
  • 84,912
  • 12
  • 139
  • 238
  • in opening the file with HxD the first four hex numbers are `72 00 00 00` which I believe is the character 'r' in ASCII/utf8. I still dont understand how this can be the solution when specifying an Encoding solves the problem. – P.Brian.Mackey Jun 23 '11 at 21:08
  • @P.Brian: I'm assuming that you get `72 00 00 00` when you encode as UTF-32, which implies to me that the BOM is not being inserted and Excel is just guessing the right encoding because the file doesn't make sense as ANSI. What are the first 4 bytes when encoded as UTF-8? – Gabe Jun 23 '11 at 21:14
2

Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:

What's the best way to export UTF8 data into Excel?

Community
  • 1
  • 1
Joel C
  • 5,547
  • 1
  • 21
  • 31
  • The BOM works like a charm and is much simpler than my original solution. Both you and Gabe deserve answer credit, but I can only give it to one of you. Thanks guys. – P.Brian.Mackey Jun 24 '11 at 13:53