0

I have found that the character is not being correctly escaped and is producing the character ,

//ajax the next page
var sendRequest = function(href, direct){
    pushState(href);
    jQuery.ajax({
        url: href,
        dataType: "html",
        cache: false,
        contentType: "text/html; charset=utf-8",
        success: function(data) {
            pushNewData(data, direct);
        }
    });
}

For example, the text ‘electric brain‘ within the element port_quote, is incorrect.

<div id="port_quote">
    <p>"Toi is the Maori word for art and the literal translation of rorohiko, the Maori word for computer, is �electric brain�. Rorohiko is morphed into Rerehiko and Toi Rerehiko is a moving image art form immersed in Maori tradition, tikanga (custom) and values which uses digital and electronic media."</p>
</div>

The request headers are as follows...

Accept  text/html, */*; q=0.01
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Content-Type    text/html; charset=utf-8

What can be done the fix the issue with some characters not being correctly escaped? I've thought about maybe looping through the data variable and just replacing each with its ascii equivalent (but I'd prefer much more generic approach).

classicjonesynz
  • 4,012
  • 5
  • 38
  • 78
  • 1
    That character is U+2018, LEFT SINGLE QUOTATION MARK. Do you need `charset=utf-8`? – Keith Thompson Apr 11 '13 at 01:02
  • Yeah, code was old, just edited the question, I was testing it with `iso-8859-1` to see if the same result would occur (and it did :(). – classicjonesynz Apr 11 '13 at 01:09
  • The same result *cannot* happen with `ISO-8859-1`. The jQuery `contentType` is the *request's* content type, what matters here is the server's - which you don't show. Anyway, the server is sending data encoded most likely in Windows-1252, definitely not UTF-8. What are the response headers? – Esailija Apr 11 '13 at 10:09

2 Answers2

1

Need to use HTML entities to display those characters:

http://www.danshort.com/HTMLentities/

That would be &lsquo; I believe.

To do the encoding, since you're already using jQuery you could use this method to encode the paragraph element's text

Community
  • 1
  • 1
Alex W
  • 37,233
  • 13
  • 109
  • 109
  • The problem is that the `htmlentity` is not be [decoded](http://iforce.co.nz/i/vyseycqs.15z.png) correctly during the process of retrieval, from the ajax function. But is being correctly [decoded](http://iforce.co.nz/i/krfjjfm3.orf.png) on page refresh. – classicjonesynz Apr 11 '13 at 01:27
  • It has to be character encoded as a single character to produce the results you're seeing. If it was an entity, the worst that could happen is that it would show the actual entity code within the text – Alex W Apr 11 '13 at 01:31
  • Thanks for the suggestion, I've been writing this jquery to overlay on a CMS (that was written by someone else & not by myself. Your suggestion gives me the idea that `text` isn't being sanitized as UTF-8 before it reaches the database or during output, sigh). – classicjonesynz Apr 11 '13 at 01:39
1

I'm pretty sure this isn't an issue with the ajax request or any of the JavaScript. In fact, you don't need to include the "contentType" option in your call to .ajax(). Also, it's not a question of "escaping" but of "encoding." You need the server to properly encode the response so it is UTF-8. The server should also set the "Content-Type" response header to "text/html; charset=utf-8". First you could check what the response headers look like for the ajax response. But even if they are correct, that does not mean that the response was actually encoded properly. You'd have to check your server code to see what it is doing.

John S
  • 21,212
  • 8
  • 46
  • 56