1

I'm trying to bring in a csv for some javascript to munch on and spit out on an html page. The csv has some special characters like ½ and ×. According to Firebug, when I put a breakpoint inside the callback of $.get(), it looks like already there the special characters are missing. They are replaced with some sort of whitespace that displays as a question mark or box if I copy and past it into another program.

I have tried

$.ajaxSetup({ 
    dataType: "text" , 
    contentType: "text/plain; charset=utf-8"
});

and other variations. The doctype of my webpage is utf-8. I have also tried 8859-1. Nothing so far has worked.

EDIT: placing the characters by hand into the html either as is or using html entity codes works fine. Placing them with javascript works too. The only problem is reading this CSV file.

EDIT2: Try this. Create a text file with this in it Öç¼». Then create a webpage like so...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js"></script>
<script type="text/javascript">


    $.get("encodeme.txt", function(data){
            console.log(data);
        })
</script>

</head>

<body>
</body>
</html>

All that is logged is a whitespace and Chinese character: �缻. Notice that the whitespace appears as a qestion mark thingy when I copypaste it.

Moss
  • 3,695
  • 6
  • 40
  • 60
  • 1
    Have you tried serializing to JSON? – Justin Oct 12 '11 at 03:08
  • 2
    How are you setting the doctype? Is your server outputting a `Content-type: text/plain; charset=utf-8` header as well? If even a single stage of the whole file->webserver->client isn't set to UTF-8, you'll get mangled text coming out the end. – Marc B Oct 12 '11 at 03:12
  • So far I'm just running the page locally. The doctype is ` ` the charset for the webpage is ``. – Moss Oct 12 '11 at 04:08

3 Answers3

1

Blah! I should have seen this sooner. The problem was that the csv file was encoded as ANSI. I did briefly look at the file in Notepad++ and should have noticed the problem there but I foolishly missed it the first time. I selected Format > Convert to UTF-8 in Notepad++ and now it works fine. So Marc B was closest to answering the question, although he didn't post it as an answer for some reason. Now, how to get OpenOffice to encode my files correctly...

Moss
  • 3,695
  • 6
  • 40
  • 60
  • Duh, OOo asks me what encoding to use every time I save a spreadsheet as csv. Would be nice if I could make it default to UTF-8 though. – Moss Oct 12 '11 at 19:24
0

How about this

$.ajaxSetup({ 
    dataType: "text" ,  
    scriptCharset: "utf-8" , 
    contentType: "application/json; charset=utf-8"
});

I found this function in here

   function char_convert(){
  var chars = ["©","Û","®","ž","Ü","Ÿ","Ý","$","Þ","%","¡","ß","¢","à","£","á","À","¤","â","Á","¥","ã","Â","¦","ä","Ã","§","å","Ä","¨","æ","Å","©","ç","Æ","ª","è","Ç","«","é","È","¬","ê","É","­","ë","Ê","®","ì","Ë","¯","í","Ì","°","î","Í","±","ï","Î","²","ð","Ï","³","ñ","Ð","´","ò","Ñ","µ","ó","Õ","¶","ô","Ö","·","õ","Ø","¸","ö","Ù","¹","÷","Ú","º","ø","Û","»","ù","Ü","@","¼","ú","Ý","½","û","Þ","€","¾","ü","ß","¿","ý","à","‚","À","þ","á","ƒ","Á","ÿ","å","„","Â","æ","…","Ã","ç","†","Ä","è","‡","Å","é","ˆ","Æ","ê","‰","Ç","ë","Š","È","ì","‹","É","í","Œ","Ê","î","Ë","ï","Ž","Ì","ð","Í","ñ","Î","ò","‘","Ï","ó","’","Ð","ô","“","Ñ","õ","”","Ò","ö","•","Ó","ø","–","Ô","ù","—","Õ","ú","˜","Ö","û","™","×","ý","š","Ø","þ","›","Ù","ÿ","œ","Ú"]; 
  var codes = ["&copy;","&#219;","&reg;","&#158;","&#220;","&#159;","&#221;","&#36;","&#222;","&#37;","&#161;","&#223;","&#162;","&#224;","&#163;","&#225;","&Agrave;","&#164;","&#226;","&Aacute;","&#165;","&#227;","&Acirc;","&#166;","&#228;","&Atilde;","&#167;","&#229;","&Auml;","&#168;","&#230;","&Aring;","&#169;","&#231;","&AElig;","&#170;","&#232;","&Ccedil;","&#171;","&#233;","&Egrave;","&#172;","&#234;","&Eacute;","&#173;","&#235;","&Ecirc;","&#174;","&#236;","&Euml;","&#175;","&#237;","&Igrave;","&#176;","&#238;","&Iacute;","&#177;","&#239;","&Icirc;","&#178;","&#240;","&Iuml;","&#179;","&#241;","&ETH;","&#180;","&#242;","&Ntilde;","&#181;","&#243;","&Otilde;","&#182;","&#244;","&Ouml;","&#183;","&#245;","&Oslash;","&#184;","&#246;","&Ugrave;","&#185;","&#247;","&Uacute;","&#186;","&#248;","&Ucirc;","&#187;","&#249;","&Uuml;","&#64;","&#188;","&#250;","&Yacute;","&#189;","&#251;","&THORN;","&#128;","&#190;","&#252","&szlig;","&#191;","&#253;","&agrave;","&#130;","&#192;","&#254;","&aacute;","&#131;","&#193;","&#255;","&aring;","&#132;","&#194;","&aelig;","&#133;","&#195;","&ccedil;","&#134;","&#196;","&egrave;","&#135;","&#197;","&eacute;","&#136;","&#198;","&ecirc;","&#137;","&#199;","&euml;","&#138;","&#200;","&igrave;","&#139;","&#201;","&iacute;","&#140;","&#202;","&icirc;","&#203;","&iuml;","&#142;","&#204;","&eth;","&#205;","&ntilde;","&#206;","&ograve;","&#145;","&#207;","&oacute;","&#146;","&#208;","&ocirc;","&#147;","&#209;","&otilde;","&#148;","&#210;","&ouml;","&#149;","&#211;","&oslash;","&#150;","&#212;","&ugrave;","&#151;","&#213;","&uacute;","&#152;","&#214;","&ucirc;","&#153;","&#215;","&yacute;","&#154;","&#216;","&thorn;","&#155;","&#217;","&yuml;","&#156;","&#218;"];
  for(x=0; x<chars.length; x++){
   for (i=0; i<arguments.length; i++){
    arguments[i].value = arguments[i].value.replace(chars[x], codes[x]);
   }
  }
 }
Community
  • 1
  • 1
kst
  • 1,498
  • 3
  • 19
  • 34
  • Sorry, still the same problem. – Moss Oct 12 '11 at 04:11
  • What is this dropbox thing? I already have a conversion function but it doesn't even get to work because it never gets to see those special characters. When jquery gets the csv file the characters are already gone. – Moss Oct 12 '11 at 08:04
0

This is classic character encoding (I think). I never rely on anything more than alphanumeric characters to display. Anything else I escape. Even if your CSV comes back with the proper characters they still might get mangled once you print them to the DOM (I had a very nasty experience regarding French accented characters and properties files which took forever to fix, so I no longer take chances with exotic characters.).

Any characters in your HTML apart from A-Z, numbers, and basic punctuation should be escaped:

&eacute; makes é
&mdash; makes —
  • I'm not sure what you are saying is your actual solution. How do you escape your exotic characters? – Moss Oct 12 '11 at 04:10
  • Sorry, looks like the example got mangled when I wrote it. I made an edit :) –  Oct 15 '11 at 22:27
  • Well I certainly don't want to type in those codes by hand or even have to run a search and replace on my source files. Everything works fine as long as everything is properly encoding and decoding as UTF-8. – Moss Oct 22 '11 at 23:31