21

I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

And also:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

But that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}
dda
  • 6,030
  • 2
  • 25
  • 34
Marcos Marin
  • 752
  • 1
  • 5
  • 17

3 Answers3

24

It is my understanding that Javascript uses UTF-8 for its strings

No, no.

Each page has its charset enconding defined in meta tag, just below head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea to define its target charset encoding on server side.

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no.

The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action:

You can use this link as guideline (See JavaScript escapes)

Added to original answer how I implement jQuery funcionality

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

Regards,

Chrillewoodz
  • 27,055
  • 21
  • 92
  • 175
Arthur Ronald
  • 33,349
  • 20
  • 110
  • 136
  • Thanks for the informative answer, I'm marking it as correct even though this was not exactly the solution. My post didn't really give enough information to show the real issue. (I only found out about that after banging my head against the wall for a few more hours) – Marcos Marin Feb 21 '10 at 01:40
  • @Marcos Marin Added content to original answer – Arthur Ronald Feb 21 '10 at 02:02
  • For C# : <%@ Page RequestEncoding="utf-8" ResponseEncoding="utf-8" %> – Eduardo Fabricio Aug 02 '16 at 16:04
4

I had a very similar problem. I needed to pass a URL parameter using JQuery to make an ajax call, and most of the times parameters values included accents.

Both pages had to be set to charset=ISO-8859-1 and javascript's functions: encodeURI, encodeURIComponent etc. only uses UTF-8.

What I did was to create a link in the original page, including all parameters without any encoding, let's say:

var myLink = document.getElementById("myHiddenLink");
myLink.setAttribute("href", "México, Perú, María and any other words with accents and spaces");

and then assign the href value to a variable, like this:

var theLink = myLink.getAttribute("href");

So finally "theLink" variable value was ISO-8859-1 encoded, and everything worked just fine.

Sergio
  • 658
  • 1
  • 9
  • 22
-1

You can now decode strings using TextDecoder:

const decoded = new TextDecoder('windows-1252').decode(encoded)

note that windows-1252 is equivalent to ISO-8859-1 for more, checkout https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API/Encodings

kigiri
  • 2,952
  • 21
  • 23