Extra characters when fetching HTTP using Socket

Question

I'm using a socket to connect to various XML webservices. But when i convert my recieved bytes to a string (usually UTF-8 encoded) I get some extra string interspersed. Most of the time the returned string starts with something like "4000\r\n" and then "\r\n4000\r\n" is interspersed through the data. Other times the string can be "\r\nd1ef\r\n" or other combinations of 4-8 hex "letters". Sometimes it is all at once. Some stuff i noticed:

If there is no "xxxx\r\n" in the beginning, the string is clean
I always get the same result (same extra strings at the same locations) if I call the same URL multiple times
The strings are usually 4 hex chars with "\r\n" around it, but it can also be 8 hex chars
It happens with many different webservices, so it's probably not on the server side
Since it always starts and ends with "\r\n" it cannot be random extra bytes of data

I'm guessing this is some kind of HTTP "paging"-feature or something that I am not aware of.

This is my code:

var client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
client.ReceiveTimeout = timeout;
client.SendTimeout = timeout;
client.NoDelay = true;
client.Connect(server, port);

//send HTTP request
client.Send(totalData, totalData.Length, SocketFlags.None);

//read the data
var buffer = new byte[32];
byteStream = new MemoryStream();

while (true)
{
    var readCount = client.Receive(buffer, buffer.Length, SocketFlags.None);

    if (readCount > 0)
    {
        byteStream.Write(buffer, 0, readCount);
    }
    else
        break;
}

client.Disconnect(false);
client.Close();

//get the HTTP response
var bytes = byteStream.ToArray();

var ascii = Encoding.ASCII.GetString(bytes.ToArray());

var bodyPosition = ascii.IndexOf("\r\n\r\n") + 4;

var bodyBytes = new byte[bytes.Length - bodyPosition];
Array.Copy(bytes,bodyPosition,bodyBytes,0,bodyBytes.Length);

var body = dataEncoding.GetString(bodyBytes);

Does anyone know what I'm doing wrong?

possible duplicate of [Why does the HTTP response body contain "2fb" at the beginning?](http://stackoverflow.com/questions/14955994/why-does-the-http-response-body-contain-2fb-at-the-beginning) — CodeCaster, Nov 13 '13 at 11:27

score 3 · Accepted Answer · answered Nov 13 '13 at 11:23

3

That is chunked transfer encoding. Use an HTTP library.

answered Nov 13 '13 at 11:23

CodeCaster

147,647
23
218
272

Hmm, i figured as much. I did use HttpWebRequest but found that it was very slow in some situations, see [my previous post](http://stackoverflow.com/questions/18465504/httpwebrequest-is-slow-with-chunked-data). I guess I need to read up on Chunked HTTP and see if I can modify my code to account for it. Thanks! – DukeOf1Cat Nov 13 '13 at 11:29
Can't help you there, but implementing chunked transfer encoding support shouldn't be too hard. :-) – CodeCaster Nov 13 '13 at 11:33

Extra characters when fetching HTTP using Socket

1 Answers1