I wonder if there is a way in wireshark to reconstruct a complete TCP Session (HTML page(s)) if we have wireshark pcaps, can wireshark do the reconstruction? or is there any tool around that can do the reconstruction? Data streamed from a source could be compressed(Gzip) or uncompressed and the end result of reconstruction should be a valid complete HTML page with all of its contents.
-
Although not quite the same scope, [Fiddler](http://www.fiddler2.com/fiddler2/) might handle that. – Orbling May 27 '11 at 11:32
-
@Orbling, well that is not really what I want. But thanks for the info. – May 27 '11 at 12:22
-
@Wajih: Well it depends what you want, which is why I said it has a slightly different scope. Fiddler can record the packets in to and out of the browser and replay to an extent. But does not have Wireshark's wider promiscuous mode capture style. – Orbling May 27 '11 at 13:25
-
@Orbling well I might need something better. Cant just look at files, I need all the html page, as in a browser available on disk. I know the links to images will be there, once page is opened in a browser, the images, and any linked content would appear. My requirement is just the complete session. – May 27 '11 at 13:35
-
@Wajih: Yes, I can see what you need. Fiddler can record a whole session, but only on the actual machine in question to my knowledge. Obviously it knows nothing of non-network events within the browser, so much dynamic content these days. – Orbling May 27 '11 at 13:40
-
@Orbling, recording is half the problem.... :( – May 27 '11 at 14:06
5 Answers
You can also use Bro if you prefer a command-line interface. Simply load it with the contents
script:
bro -r trace.pcap -f 'port 80' contents
(You can skip the optional BPF filter expression -f port 80
.) This extracts the full TCP stream and writes it to files of the form:
contents.<sourceIP>.<sourcePORT>-<destinationIP>.<destinationPORT>
As Christian mentioned, the reassembly is highly robust and has been tested thoroughly.
TCPTrace has an option (-e) for this:
Extracting: The -e option can be used to extract the contents (TCP data payload) of each connection into a separate data file.
For example,
Beluga:/Users/mani> tcptrace -e albus.dmp
generates files a2b_contents.dat, b2a_contents.dat; c2d_contents.dat, d2c_contents.dat if the file albus.dmp had 2 traced TCP connections. tcptrace is pretty smart in generating these contents files. It does not commit trivial mistakes like saving retransmissions multiple times in the file for example, and is aware of sequence space wrap-arounds. However, if you want the entire contents of the traffic, please make sure that packets are captured in their entirety (give suitable snaplen value with tcpdump for example).

- 8,361
- 2
- 37
- 34
-
-
Only if it would prepend the tcp stream ordinal numbers to the filenames, as it does in the stdout log... i will have to write some perl :) – Pavel Zdenek Jul 10 '12 at 15:33
-
The link on TCPTrace is broken, it sends me to “Apache HTTP Server Version 2.2 Documentation”. It should point to the [start page](http://www.tcptrace.org/) or [the docs](http://www.tcptrace.org/manual.html) instead. – mknecht Jun 02 '15 at 08:39
Depending on what version of Wireshark you have, you should be able to do something along the lines of:
- Filter out the session you care about
- Do File->Export->Objects->Http
- Select a folder.
Is there something more you need... this appears to do the gzip decompression etc... won't work if you're running SSL (it MIGHT be able to if you can get the appropiate keys to make the SSL decode work, but that gets trickier and I'd suggest trying fiddler in that case)
HTH

- 6,148
- 11
- 40
- 42
-
probably half what I need! I guess I will look into wireshark code and do the rest of the reassembly myself! – Jun 10 '11 at 12:21
-
@Wajih Interesting... if you don't mind my asking, what else would you need reassembled? (As I was asking this question, I realized one thing I have no idea what wireshark does would be anything AJAX-related (though I guess if doing XML and not JSON, it could save the responses as XML files) – Foon Jun 10 '11 at 17:04
-
means to rebuild the webpage completely, with all its constituents. Wireshark wont do anything to the JScripts or anything when saved as you have mentioned, but rather I would crave out a complete page from what WireShark dumps. – Jun 10 '11 at 17:10
I suggest tcpflow, a full-featured tcp/ip session reconstructor. It is very fast, will handle very large sessions, automatically decompresses gzip'ed connections, automatically breaks out MIME objects sent by HTTP, creates an XML file of what it's done, runs on MacOS, Linux and Windows, and more. It's a command-line tool.

- 28,461
- 37
- 122
- 246
Use justniffer-grab-http-traffic .It is based on justniffer and it is an excellent tool for rebuilding tcp streams.

- 26
- 2