I found another question that asked for the same type of functionality, but the question is more than 2 years old so I was wondering if anybody has seen anything since then.
I've basically written my own asynchronous http/socket client using the standard .NET sockets. I maintain a pool of 1024 sockets and I have 128 "service" threads using the pool of sockets to download web pages from the internet at a rate of up to 371 pages per second (just tested it today on a single Amazon's EC2 server). I also made another asynchronous HTTP client which uses HttpWebRequest
to asynchronously download web pages, but it's SIGNIFICANTLY slower: my throughput is on average about 50 pages per second (also tested on Amazon's EC2) using the same setup: 1024 pooled HttpWebRequest
s and 128 "service" threads.
Naturally, providing HTTP protocol support will take up some more processing power and memory. I'm hoping that with Amazon's Extra Large EC2 server I will not be restricted by the processing power/memory, but by the network bandwidth only (which has been the case so far).
An example of the the machine(s) that I'm using is Amazon's High-CPU Extra Large Instance:
- 7 GB of memory
- 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
- 1690 GB of instance storage
- 64-bit platform
- I/O Performance: High
- API name: c1.xlarge
I can write my own HTTP processing which complies with the HTTP protocol, but it will save me a TON of work, pain and suffering if there is an off-the-shelf solution that is fast and robust.
I need the following functionality at the very minimum:
- Build an HTTP HEAD/GET (and maybe POST) requests
- Parsing of HTTP Response from binary stream
- Supports cookies
- LGP license (LGPL)
Does anybody know of any such solutions?