An HTTP POST response is returning '\u0' characters between letters, how do I remove those or parse them in JavaScript?

Question

The title pretty much says it all.

I'm sending an HTTP POST to a .dll provided to me. The response text contains information that I need to parse and display to the user in a human readable way. I knew the response, but my JavaScript was informing me that the response wasn't matching, but when I viewed the response text, it was clearly exactly the same.

Well, when I looked a little closer and view the response using Chrome's dev tools, it shows that there are '\u0' characters after every letter. Is it an end-of-character, or some kind of terminating mark for each character?

My first guess was that it is a character encoding issue, but I'm not really sure.

Could anyone enlighten me as to what's really going on? How do I replace those characters so I can check for a substring in the response?

It's an AJAX POST request to a .dll served up by IIS 7, from a company called Magic Software.

Here's the response:

HTTP/1.1 500 Internal Server Error
Cache-Control: private
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 21 Nov 2013 23:51:46 GMT
Content-Length: 60

<h1>Max instance reached.</h1>

EDIT:

I used the following function to convert the UTF-16 string that I was getting into UTF-8. It works for my purpose. I cobbled it together from two different sources:

http://jonisalonen.com/2012/from-utf-16-to-utf-8-in-javascript/
Convert integer array to string at javascript

I should have much better knowledge of character encodings, and I haven't read too much into what this does together. I'm going to do some reading. :P

Can someone look over this and tell me if it is an appropriate solution?

    function UTF16toUTF8Str(str) {
        var utf8 = [];
        for (var i = 0; i < str.length; i++) {
            var charcode = str.charCodeAt(i);
            if (charcode < 0x80) utf8.push(charcode);
            else if (charcode < 0x800) {
                utf8.push(0xc0 | (charcode >> 6),
                0x80 | (charcode & 0x3f));
            }
            else if (charcode < 0xd800 || charcode >= 0xe000) {
                utf8.push(0xe0 | (charcode >> 12),
                0x80 | ((charcode >> 6) & 0x3f),
                0x80 | (charcode & 0x3f));
            }
                // surrogate pair
            else {
                i++;
                // UTF-16 encodes 0x10000-0x10FFFF by
                // subtracting 0x10000 and splitting the
                // 20 bits of 0x0-0xFFFFF into two halves
                charcode = 0x10000 + (((charcode & 0x3ff) << 10)
                | (str.charCodeAt(i) & 0x3ff))
                utf8.push(0xf0 | (charcode >> 18),
                0x80 | ((charcode >> 12) & 0x3f),
                0x80 | ((charcode >> 6) & 0x3f),
                0x80 | (charcode & 0x3f));
            }
        }
        var i, str = '';

        for (i = 0; i < utf8.length; i++) {
            if (utf8[i] !== 0) str += '%' + ('0' + utf8[i].toString(16)).slice(-2); // only add non-null characters to the string
        }
        str = decodeURIComponent(str);
        return str;
    }

EDIT

Here is the response from a HAR file that I got from Chrome's dev tools:

        "response": {
          "status": 500,
          "statusText": "Internal Server Error",
          "httpVersion": "HTTP/1.1",
          "headers": [
            {
              "name": "Date",
              "value": "Fri, 22 Nov 2013 03:35:59 GMT"
            },
            {
              "name": "Cache-Control",
              "value": "private"
            },
            {
              "name": "Server",
              "value": "Microsoft-IIS/7.5"
            },
            {
              "name": "X-AspNet-Version",
              "value": "4.0.30319"
            },
            {
              "name": "X-Powered-By",
              "value": "ASP.NET"
            },
            {
              "name": "Content-Length",
              "value": "60"
            },
            {
              "name": "Content-Type",
              "value": "text/html"
            }
          ],
          "cookies": [],
          "content": {
            "size": 60,
            "mimeType": "text/html",
            "compression": 0,
            "text": "<\u0000h\u00001\u0000>\u0000M\u0000a\u0000x\u0000 \u0000i\u0000n\u0000s\u0000t\u0000a\u0000n\u0000c\u0000e\u0000 \u0000r\u0000e\u0000a\u0000c\u0000h\u0000e\u0000d\u0000.\u0000<\u0000/\u0000h\u00001\u0000>\u0000"
          },
          "redirectURL": "",
          "headersSize": 223,
          "bodySize": 60
        },
        "cache": {},
        "timings": {
          "blocked": 0,
          "dns": -1,
          "connect": -1,
          "send": 0,
          "wait": 475.0000000349246,
          "receive": 1.500034297350794,
          "ssl": -1
        },
        "connection": "21740",
        "pageref": "page_127"
      }
    ]
  }
}

score 3 · Answer 1 · answered Nov 21 '13 at 23:36

3

This sounds like an character encoding issue to me. UTF-16 encoding (among other 16-bit character sets) will use an extra byte, and it will be 0x00 for most Western characters like you are seeing.

You could probably hack this back together with JavaScript. However, depending on the format of the data you can specify the correct character set and the browser may take care of it for you. If not, you can always write some server-side code that proxies the request and modifies the response data before sending it to the client.

answered Nov 21 '13 at 23:36

Brad

159,648
54
349
530

@user2864740 Right, but it might not be set at all, and it might be set wrong. Who knows what this DLL is doing. If Zaemz could post a packet capture, that would be helpful. – Brad Nov 21 '13 at 23:38
I put the full response up there. It isn't specifying a character encoding at all, but I don't know what to assume. The company likes to default to "West European", I think ISO 8859-1. – Nathan Lutterman Nov 21 '13 at 23:59
@Zaemz Since you're getting HTML back, the character set can actually be defined in the headers: `Content-type: text/html; charset=utf-16` I don't know what character set it is though. If you could post an actual packet capture with Wireshark or something, we could probably figure it out. But, try UTF-16 first. – Brad Nov 22 '13 at 03:10
I don't really understand or know how to use a packet sniffer. :( I tried, but there's so much stuff going in and out of the network, I can't see what I really want, or even know what I want, really. – Nathan Lutterman Nov 22 '13 at 03:14
@Zaemz Start a capture on your network card... if you are seeing things whizzing by then it is probably working. Then, go to your site. (Make sure the page isn't cached... either clear your cache or hit refresh a couple times.) Then, stop the capture and save a file. In Wireshark, you can see HTTP requests by typing `http` into the filter box. Or, you can just upload that whole file to the internet somewhere. Tell us what URL you tried to load, and we'll figure it out. You can also save a SAZ file with Chrome or Fiddler, which should work in this case. – Brad Nov 22 '13 at 03:16
If I'm sending and receiving packets from the localhost, are they going to be able to show up in Wireshark, because when I filter for HTTP requests, there's nothing. I'm listening on both network interfaces. This is also in a virtual machine, would that make a difference? – Nathan Lutterman Nov 22 '13 at 03:34
If you're using Windows, you can't sniff on localhost. If you're using a VM, you're actually using a virtual network interface and not actually using localhost. (Unless you use port forwarding or something, but in the end it has to go over the virtual network interface anyway.) Just open Google Chrome's developer tools, click the Network tab, right click, and save an HAR file. (I was wrong before, only Fiddler does the SAZ file. It doesn't mater, they are effectively the same thing.) – Brad Nov 22 '13 at 03:43
I am indeed using Windows Sever 2008 R2, and it's a VM running under a hypervisor. I've added the response from the HAR file. – Nathan Lutterman Nov 22 '13 at 03:59

An HTTP POST response is returning '\u0' characters between letters, how do I remove those or parse them in JavaScript?

1 Answers1