I am writing a script to scrape the data from web. I would like to calculate the Size of each request and response to monitor my network consumption. So, Is there any way?
1 Answers
One would be tempted in using the response header to have an idea of the network usage :
>>> response = requests.get('http://edition.cnn.com')
>>> response.headers['Content-Length']
'28321'
Problem :
This is not accurate, from a network consumption standpoint ! Indeed, the content-length header field gives us the size of the HTTP response body, without regard for HTTP headers, and the complete ethernet/IP/TCP headers.
After adding the size of all corresponding packets on wireshark, I end up with 30784 bytes, without including the TCP acks (8% variation in my case, which jumps to 13% if I add the handshake, HTTP request and connection closure).
Some insights :
My advice would be to tcpdump
HTTP traffic (which I define here by traffic to port 80, being aware that it is wrong), and to process the output with some good old python.
You can use sudo tcpdump -n "dst port 80" -w cap.pcap
to dump all traffic to port 80, and then refer to this SO question regarding how to process the output.
Hope it'll be helpful.
-
1I tried to follow your way but, there is no "Content-Length" on my request header. – supersigdel Jun 23 '16 at 12:27