0

A client-side script takes the text within a text input, "wraps" it within an XML block and sends it to a server that stores the information in a MySQL database.

As a first step before wrapping the input value, I escape the "&" characters like so:

var copyright = copyright.replace(/&/g,"&");

The resulting XML data block is sent to the server using jquery's ajax method:

var copyright = copyright.replace(/&/g,"&"),
    xml = "<request><session>"+session+"</session><space>"+space_id+"</space><view>"+view_id+"</view><copyright>"+copyright+"</copyright></request>",
    url = "hidden URL";

    $.ajax({ 
        type: "POST", 
        url: url,
        contentType: "text/xml; charset=UTF-8", 
        dataType: "xml;charset=UTF-8",
        data: xml
    });

Later after this operation, the content that was previously saved within the database needs to be retrieved and displayed within a web page:

$.ajax({ 
    type: "POST", 
    url: url,
    dataType: 'xml',
    data: xmlString, 
    success: function(xml) { 
          var XML = $(xml);
            // Process the data retrieved
    },
    error: function(jqXHR, textStatus, errorThrown) {
        var XML = $(jqXHR.responseText);
            console.log("error: "+textStatus+"\n"+errorThrown);
    }
});

If an ampersand was typed in the input field and then saved, when trying to load the page that displays the same previously saved content, the ajax call breaks and runs down the error event handler, with the following error:

error: parsererror
Error: Invalid XML: <?xml version="1.0" encoding="UTF-8"?><response><target>    
<target_id>2095466</target_id>    
<plot>20029/13</plot>    
<builder>Lemminkäinen</builder>    
<housing_form>vm</housing_form>    
<block_name></block_name>    
<finnish_year>2013</finnish_year>    
<target_name>As Oy Helsingin Saukonranta</target_name>    
<target_address>Saukonpaadenranta 8</target_address>    
<office_space></office_space>    
<purpose></purpose>    
<reservations></reservations>    
<contacts></contacts>    
<infoflag>2</infoflag>    
<views>    
<view>    
<view_id>2095468</view_id>    
<copyright>B&M</copyright>    
</view>    
</views>    
</target>    
<status>OK</status><errormsg></errormsg></response> 

What is it that I'm doing wrong? Am I escaping the characters wrongly, or is it something else?

This question may seem to be a duplicate, but to me it doesn't seem like it since the ampersand characters have been escaped prior to being stored. I even tried adding additional (1, then two) amp;s to the escape string, but the result is EXACTLY the same.

Andrei Oniga
  • 8,219
  • 15
  • 52
  • 89
  • Not sure if this is the problem or just a typo, but shouldn't `var copyright = copyright.replace(/&/g,"&"),` be var copyright = copyright.replace("/&/g","&"),`? – ametren Oct 22 '12 at 14:33
  • Not sure but try this `var copyrightEscaped = copyright.replace(/&/g,"&")` – Shahid Oct 22 '12 at 14:39
  • you may need to escape `ä` in `Lemminkäinen` and other non-ASCII characters. See this [link](http://stackoverflow.com/questions/784586/convert-special-characters-to-html-in-javascript) – Shahid Oct 22 '12 at 14:43
  • The format of the regexp used with `replace` is not what's causing the problems, here's an XML block as it is sent to the server (after the ampersand chars have been escaped): `{D7CEFA2E}20815612081563Kirsi Korhonen ja Mika Penttil A&T`. The original string was "[...] A&T". – Andrei Oniga Oct 22 '12 at 14:43
  • The special characters are not causing any problems, the call triggers the `error` event handler only when the string contains ampersand characters. – Andrei Oniga Oct 22 '12 at 14:44

2 Answers2

2

It turns out that the problem actually came from the server (to which I did not have access), the script that handled the requests did not escape the ampersand characters correctly, even though they were on the client-side. Bellow is a JavaScript function that escapes all (?) special characters used with XML, just in case someone needs it:

function escapeXML(string){

    var str = string;
    str = str.replace(/\&/g,"&amp;");
    str = str.replace(/\>/g,"&gt;");
    str = str.replace(/\</g,"&lt;");
    str = str.replace(/\"/g,"&quot;");
    str = str.replace(/\'/g,"&apos;");

    return str;
}
Andrei Oniga
  • 8,219
  • 15
  • 52
  • 89
0

The problem is the ä character of the Lemminkäinen in the builder node, as pointed by Shahid. When the Lemminkäinen text is UTF-8 decoded, the ä would be part of a two-characters UTF-8 encoding. So the UTF-8 decoder would try to decode äi, which is not a valid character sequence. The correct UTF-8 encoded character for ä is ä, or 0xC3, 0xA4 in binary. Thus, the full UTF-8 encoded text should be Lemminkäinen.

When the reported XML data is saved in an XML file then opened with a web browser, it'll fail on all major web browsers: Chrome ("Encoding error"), Firefox ("not well-formed"), Safari ("Encoding error"), MSIE ("An invalid character was found in text content."), and Opera ("illegal byte sequence in encoding").

Since the XML data came from the server, it's likely that the script that posted the builder data didn't specify an UTF-8 character set (there's no indication that the provided codes are the one that does it). It may have caused by old script which by now, is already fixed, but the damage has already been done. i.e.: incorrect data format was added into database. Manual input into database is also a possible cause during server maintenance.

Jay
  • 4,627
  • 1
  • 21
  • 30