0

I am writing a script to scrape the data from web. I would like to calculate the Size of each request and response to monitor my network consumption. So, Is there any way?

1 Answers1

1

One would be tempted in using the response header to have an idea of the network usage :

>>> response = requests.get('http://edition.cnn.com')
>>> response.headers['Content-Length']
'28321'

Problem :

This is not accurate, from a network consumption standpoint ! Indeed, the content-length header field gives us the size of the HTTP response body, without regard for HTTP headers, and the complete ethernet/IP/TCP headers.

After adding the size of all corresponding packets on wireshark, I end up with 30784 bytes, without including the TCP acks (8% variation in my case, which jumps to 13% if I add the handshake, HTTP request and connection closure).

Some insights :

My advice would be to tcpdump HTTP traffic (which I define here by traffic to port 80, being aware that it is wrong), and to process the output with some good old python.

You can use sudo tcpdump -n "dst port 80" -w cap.pcap to dump all traffic to port 80, and then refer to this SO question regarding how to process the output.

Hope it'll be helpful.

Community
  • 1
  • 1
3kt
  • 2,543
  • 1
  • 17
  • 29