0

I've built an Android proxy server passing http request and responses using Java Sockets.

The proxy is working, all content in browser is passing through it. However I would be able to read requests/responses but their body seems to be encoded:

GET http://m.onet.pl/ HTTP/1.1
Host: m.onet.pl
Proxy-Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Linux; Android 4.4.4; XT1039 Build/KXB21.14-L1.56) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en;q=0.8,en-US;q=0.6,pl;q=0.4
Cookie: onet_ubi=201509221839473724130028; onetzuo_ticket=9AEDF08D278EC7965FF6A20BABD36EF0010012ED90FDD127C16068426F8B65A5D81A000000000000000050521881000000; onet_cid=dd6df83b3a8c33cd497d1ec3fcdea91b; __gfp_64b=2Mp2U1jvfJ3L9f.y6CbKfJ0oVfA7pVdBYfT58G1nf7T.p7; ea_uuid=201509221839478728300022; onet_cinf=1; __utma=86187972.1288403231.1442939988.1444999380.1445243557.40; __utmb=86187972.13.10.1445243557; __utmc=86187972; __utmz=86187972.1442939988.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

So both in request and response a lot of "���" occurs. I didn't find any info about http encoding. What is it ? How can I properly read body ?


Assuming it might be GZIPed message I tried:

while ((count = externalServerInputReader.read(buf, 0, buf.length)) != -1)
{
    String stream = new String(buf, 0 , count);
    proxyOutputStream.write(buf, 0, count);

    if (stream.contains("content-encoding: gzip")) {
        ByteArrayInputStream bais = new ByteArrayInputStream(buf);
        GZIPInputStream gzis = new GZIPInputStream(bais);
        InputStreamReader reader = new InputStreamReader(gzis);
        BufferedReader in = new BufferedReader(reader);

        String readed;
        while ((readed = in.readLine()) != null) {
            Log.d("Hello", "UnGzip: " + readed);
        }
    }
}
proxyOutputStream.flush();

However I get error on ungzipping attempt.

unknown format (magic number 5448)

Adam Styrc
  • 1,517
  • 2
  • 23
  • 38

1 Answers1

2

I tried your sample request by saving it to "/tmp/req" and replaying it using cat /tmp/req | nc m.onet.pl 80. The server sent back a gzip encoded response, which I could tell from the response header content-encoding: gzip. In the case where the response is gzip encoded, you could decompress it in Java using java.util.zip.GZIPInputStream. Note that the user agent in your example is also advertising support for "deflate" and "sdch" too, so you may also get responses with those encodings. The "deflate" encoding can be decompressed using java.util.zip.InflaterInputStream. I'm not aware of any built in support for sdch, so you would need to find or write a library to decompress that - see this other Stack Overflow question for a possible starting point: "Java SDCH compressor/decompressor".

To address the updated part of your question where you added a stab at using GZIPInputStream, the most immediate issue is that you should only gunzip the stream after the HTTP response headers have ended. The simplest thing to do would be to wait for "\r\n\r\n" to come across the underlying InputStream (not a Reader) and then run the data starting with the next byte on through a single GZIPInputStream. That should probably work for the example you gave - I successfully decoded the replayed response I got using gunzip -c. For thoroughness, there are some other issues that will keep this from working as a general solution for arbitrary websites, but I think it will be enough to get you started. (Some examples: 1) you might miss a "content-encoding" header because you are splitting the response into chunks of length buf.length. 2) Responses which use chunked encoding would need to be de-chunked. 3) Keep-alive responses would necessitate that you track when the response ends rather than waiting for end of stream.)

Community
  • 1
  • 1
twm
  • 1,448
  • 11
  • 19
  • (quiestion updated) I tried to ungzipp but it doesn't seem one. Could that be those other encodings ? How can I know which one is it ? – Adam Styrc Oct 20 '15 at 13:43
  • You've got to gunzip just the response. You're including the headers in your current code for what gets gunzipped. Try skipping ahead to the first "\r\n\r\n" or "\n\n" for a first pass implementation. That won't be enough to handled chunked encoding, but it might be enough to get you started. – twm Oct 20 '15 at 14:07