2

I'm trying to figure out how an HTTP server is encoding/spliting a file during an http download.

When I'm using Wireshark I can find four HTTP Headers (see below) and a bunch of TCP packets without any headers. I would like to know how tcp packets are formed and if I can retrieve any concrete data from them (like the name of the file, any ID or something substantial).

First header :

GET /upload/toto.test HTTP/1.1
Host: 192.168.223.167:90
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Range: bytes=3821-3821
If-Range: "40248-5800428-4fab43ec800ce"

Second header :

HTTP/1.1 206 Partial Content
Date: Sat, 31 May 2014 21:25:31 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Sat, 31 May 2014 15:59:21 GMT
ETag: "40248-5800428-4fab43ec800ce"
Accept-Ranges: bytes
Content-Length: 1
Content-Range: bytes 3821-3821/92275752
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive

Third :

GET /upload/toto.test HTTP/1.1
Host: 192.168.223.167:90
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Range: bytes=3821-92275751
If-Range: "40248-5800428-4fab43ec800ce"

Last one :

HTTP/1.1 206 Partial Content
Date: Sat, 31 May 2014 21:25:31 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Sat, 31 May 2014 15:59:21 GMT
ETag: "40248-5800428-4fab43ec800ce"
Accept-Ranges: bytes
Content-Length: 92271931
Content-Range: bytes 3821-92275751/92275752
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive

TCP Packet following the fourth http header (in ASCII) :

PV)?FEM@@cZU:P-O"-~zLW^2&$Z$f5APzve~BuH5/}`z2MI"{lIQCBmTO-ah6O)497Kro+gS((R
8n8_lMXusDp{Qs1g?j~iZaB.ADI|yp((t3@4SA4[MV@N1(2He|a9}Dw`'=k^C;G%@KUD``Sw:KXYG1{pxP,*`BSAMO0?FlFb(~X/|Ub=H[b7Y'NAP])IARH(g*LI}AE%BzFOzN5Xf7$D|.Hw00AUh[lE)ovKAUmcSuFnzQS+T0=z7;#nKX2!>ik)p73a5{h2ZZo~etin"UCFc+#ZjgB60y()-1{e|XRj9r:zDM(ulcSAayGeZCks7Nnz{L8(&L8Ew?J9}WA/t?^xS{sbnw8J7/%Iqt0i4_h*D6?|[&3zFngl~ku>#RVp+:`'RdtKh(",MPJqx5
tov&pZV8)'X?iW(J1d-!]FM>_Q\V=&xYH C9G?dp6&
\td|k$AY!D^`HnW=OsMcbV(*(RQL-xhWPa\:C>-M'oH fGwr:0=\K7!lMoPH)fB2OSUrg89

For the curious, this file is an image of Android (sample for the question).

EDIT for CodeCaster :

I'm trying to limit the output bandwidth generated by a download requested on a nodejs server, the thing is I have to do this at a network level (with Iptables actually) and not at a code level. To do this and because it is a per user limit I have to gather a significant string that I could use to filter packets (an ASCII string or an hexa string) and limit the user download bandwith. My original question is about how the content is formated/encoded, I'm not trying to find another way (because I know there are) it is a context constraint.

Arka
  • 307
  • 1
  • 6
  • 17
  • I don't remember the specifics of the protocol, but I strongly suspect that the segmentation is determined solely by the web server. The only useful information you'll be able to pull is from the http headers. – Sam Dufel May 31 '14 at 22:01
  • Please explain your actual problem. Tcp packet segmentation is handled by the OS and none of your concern, see for example http://stackoverflow.com/questions/756765/when-will-a-tcp-network-packet-be-fragmented-at-the-application-layer. Your question seems to be about HTTP ranged requests. – CodeCaster May 31 '14 at 22:05
  • I edited my post to give you an explaination. – Arka May 31 '14 at 22:16
  • A single packet most likely does not enough information for that. Especially since you're taking about downloads over HTTP, where the headers will be in the first packet or so, and the rest will be the response body. Are you sure what you're asking is getting you closer to your goal? Did you read for example http://serverfault.com/questions/154451/throttle-bandwidth-via-iptables? – CodeCaster May 31 '14 at 22:31
  • Yes I did thanks ! I think I will use something else like the destination port or IP since there is no other way. – Arka May 31 '14 at 22:57

1 Answers1

1

TCP is a protocol in the OSI model, and PDU's (aka packets) are processed in each layer of the OSI model. In each layer, the PDU gets another header, so by the time it reaches the transport layer, it already has one header from the application layer. TCP then puts on its own header, and the PDU goes on to the network layer for further processing.

As far as data size of the PDU, that depends on the physical protocol's MTU (maximum transfer unit) For instance, Ethernet's MTU is 1500 bytes.

And as far as getting data, if you mean from the header, it's simple enough to code a solution that searches for certain attributes (like Content-Length or Server). If you mean to get data from the data PDU, that is generally not a good idea unless you are looking for analytic purposes, in which case Wireshark should work. (If I recall; it's been a long time since I used Wireshark.)

Shoikana
  • 595
  • 4
  • 8