9

I'm trying to parse a byte[] in java, which is a representation of an HTTP response. There is this question Is there any simple http response parser for Java?, which is exactly my question, but the accepted answer doesn't help me. If I look at http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/io/HttpMessageParser.html, I do not understand how this will help me.

Community
  • 1
  • 1
Gijs
  • 10,346
  • 5
  • 27
  • 38
  • What mechanism is providing you with this byte array? What method are you using to actually communicate with the HTTP server? – user3062946 Oct 24 '14 at 14:31
  • The data is coming from WARC files, collected with a webcrawler. I know there's a library that parses the whole WARC file, but I'm using it with this Hadoop mapper https://github.com/ept/warc-hadoop that uses it's own WARCRecord format. There are multiple routes around this, but I thought parsing an HTTP response should be doable. – Gijs Oct 24 '14 at 14:35
  • The docs you linked say "This library currently doesn't perform any parsing of the data inside records, such as the HTTP headers or the HTML body. You can simply read the server's response as an array of bytes. Additional parsing functionality may be added in future versions." -- Does that mean that the byte array can just be used to create a String that shows the textual HTTP response? – user3062946 Oct 24 '14 at 14:40
  • Yes, exactly. You'd get something like `HTTP/1.1 301 Moved Permanently Alternate-Protocol: 80:quic,p=0.01 Cache-Control: public, max-age=2592000 Content-Length: 218 Content-Type: text/html; charset=UTF-8 Date: Fri, 24 Oct 2014 14:43:20 GMT Expires: Sun, 23 Nov 2014 14:43:20 GMT Location: http://www.google.nl/ Server: gws X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block 301 Moved

    301 Moved

    `
    – Gijs Oct 24 '14 at 14:44
  • Ok, so basically you're asking if there's a library that will turn a textual HTTP response into some object that represents it... I don't know of any standard library implementation of this. I think your best bet is to go with the answer you originally linked. You will need to write a custom parser that implements the HttpMessageParser interface. Your 'parse()' method would likely do some string manipulation to instantiate a HttpMessage object with all of the values contained in your string response. Long story short: You probably need to write your own code to parse it. – user3062946 Oct 24 '14 at 14:50
  • 1
    Yes, that's it. Thanks for your help. I'll try and do this or find some other route. Not to take this out on anyone, but REALLY? http://asset-3.soup.io/asset/2905/6018_3568_450.jpeg. – Gijs Oct 24 '14 at 15:05

2 Answers2

11

I hope this should get you started

String s = "HTTP/1.1 200 OK\r\n" +
        "Content-Length: 100\r\n" +
        "Content-Type: text/plain\r\n" +
        "Server: some-server\r\n" +
        "\r\n";
SessionInputBufferImpl sessionInputBuffer = new SessionInputBufferImpl(new HttpTransportMetricsImpl(), 2048);
sessionInputBuffer.bind(new ByteArrayInputStream(s.getBytes(Consts.ASCII)));
DefaultHttpResponseParser responseParser = new DefaultHttpResponseParser(sessionInputBuffer);
HttpResponse response = responseParser.parse();
System.out.println(response);

This code produces the following output:

HTTP/1.1 200 OK [Content-Length: 100, Content-Type: text/plain, Server: some-server]
ok2c
  • 26,450
  • 5
  • 63
  • 71
0

Check this out: https://github.com/ipinyol/proxy-base

This is a simple highly configurable http proxy. The method readHeader of the class org.mars.proxybase.ProxyThread parses the http headers given a DataInputStream (which reads by bytes) and returns an object of type Header with information regarding the header.

Also, you probably know that either you have a content-length define in the header or you have chunked data that you must read by chunks in the http response. The methods readContent and readContentByChunk of the same class perform the reading of the content. You can explore your self the code and modify accordingly.

ipinyol
  • 336
  • 2
  • 12
  • Thanks. I hope there's a less work-intensive way, but might try and do this if there's nothing else. – Gijs Oct 24 '14 at 15:07