I am adding this answer as another possibility which people may encounter when
- downloading from multiple servers using multi-threaded apps
- using Windows XP or Vista as the operating system
The tcpip.sys
driver for these operating systems has a limit of 10 outbound connections per second. This is a rate limit, not a connection limit, so you can have hundreds of connections, but you cannot initiate more than 10/s. The limit was imposed by Microsoft to curtail the spread of certain types of virus/worm. Whether such methods are effective is outside the scope of this answer.
In a multi-threaded application that downloads from multitudes of servers, this limitation can manifest as a series of timeouts. Windows puts into a queue all of the "half-open" (newly open but not yet established) connections once the 10/s limit is reached. In my application, for example, I had 20 threads ready to process connections, but I found that sometimes I would get timeouts from servers I knew were operating and reachable.
To verify that this is happening, check the operating system's event log, under System. The error is:
EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.
There are many references to this error and plenty of patches and fixes to apply to remove the limit. However because this problem is frequently encountered by P2P (Torrent) users, there's quite a prolific amount of malware disguised as this patch.
I have a requirement to collect data from over 1200 servers (that are actually data sensors) on 5-minute intervals. I initially developed the application (on WinXP) to reuse 20 threads repeatedly to crawl the list of servers and aggregate the data into a SQL database. Because the connections were initiated based on a timer tick event, this error happened often because at their invocation, none of the connections are established, thus 10 are immediately queued.
Note that this isn't a problem necessarily, because as connections are established, those queued are then processed. However if non-queued connections are slow to establish, that time can negatively impact the timeout limits of the queued connections (in my experience). The result, looking at my application log file, was that I would see a batch of connections that timed out, followed by a majority of connections that were successful. Opening a web browser to test "timed out" connections was confusing, because the servers were available and quick to respond.
I decided to try HEX editing the tcpip.sys file, which was suggested on a guide at speedguide.net. The checksum of my file differed from the guide (I had SP3 not SP2) and comments in the guide weren't necessarily helpful. However, I did find a patch that worked for SP3 and noticed an immediate difference after applying it.
From what I can find, Windows 7 does not have this limitation, and since moving the application to a Windows 7-based machine, the timeout problem has remained absent.