2

I found another question that asked for the same type of functionality, but the question is more than 2 years old so I was wondering if anybody has seen anything since then.

I've basically written my own asynchronous http/socket client using the standard .NET sockets. I maintain a pool of 1024 sockets and I have 128 "service" threads using the pool of sockets to download web pages from the internet at a rate of up to 371 pages per second (just tested it today on a single Amazon's EC2 server). I also made another asynchronous HTTP client which uses HttpWebRequest to asynchronously download web pages, but it's SIGNIFICANTLY slower: my throughput is on average about 50 pages per second (also tested on Amazon's EC2) using the same setup: 1024 pooled HttpWebRequests and 128 "service" threads.

Naturally, providing HTTP protocol support will take up some more processing power and memory. I'm hoping that with Amazon's Extra Large EC2 server I will not be restricted by the processing power/memory, but by the network bandwidth only (which has been the case so far).

An example of the the machine(s) that I'm using is Amazon's High-CPU Extra Large Instance:

  • 7 GB of memory
  • 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
  • 1690 GB of instance storage
  • 64-bit platform
  • I/O Performance: High
  • API name: c1.xlarge

I can write my own HTTP processing which complies with the HTTP protocol, but it will save me a TON of work, pain and suffering if there is an off-the-shelf solution that is fast and robust.

I need the following functionality at the very minimum:

  • Build an HTTP HEAD/GET (and maybe POST) requests
  • Parsing of HTTP Response from binary stream
  • Supports cookies
  • LGP license (LGPL)

Does anybody know of any such solutions?

Community
  • 1
  • 1
Kiril
  • 39,672
  • 31
  • 167
  • 226
  • Did you try the performance of `WebClient`? – jgauffin May 03 '11 at 20:04
  • @jgauffin, I have not tried the `WebClient yet`. It seems like this may be a viable answer- please make sure you post it so I can accept it once I test it out. – Kiril May 03 '11 at 20:07
  • I wrote I comment since I don't know if it's more performant. But I wrote it as an answer per your wish. ;) – jgauffin May 03 '11 at 20:09

1 Answers1

3

I don't know how HttpWebRequest works with sockets internally. Open/Closing sockets might be a big performance hit. WebClient uses keep-alive and might work better.

Edit: I did a bit of googling and I wouldn't accept this as an answer. WebClient seems to be a wrapper around HttpWebRequest/Response: http://www.codeproject.com/Articles/156610/WP7-WebClient-vs-HttpWebRequest.aspx?msg=3775084

Update

Since you have started with sockets, I would stick with them. Feel free to take stuff from my webserver project: http://webserver.codeplex.com

My parser:

http://webserver.codeplex.com/SourceControl/changeset/view/56552#671689

jgauffin
  • 99,844
  • 45
  • 235
  • 372
  • does the `WebClient` allow the client to connect to multiple different endpoints? I have thousands of end points and the async sockets works very well with them so far (even after closing the connection)... I still can't figure out how to reuse the sockets tho: http://stackoverflow.com/questions/5762276/reuse-asynchronous-socket-subsequent-connect-attempts-fail – Kiril May 03 '11 at 20:10
  • I would stick to sockets and use a parser. Feel free to borrow ideas from the parser in my webserver project: http://webserver.codeplex.com. I wouldn't generate headers objects though, not very performant to do so. – jgauffin May 03 '11 at 20:15