5

I have a quite strange situation.

I have this very simple package:

enter image description here

  • Task "get list" retrieves a data table from an assembly with one column and a list of URL to be ran into a object variable.
  • The "foreach" loop loops through the object variable and loads the URL into a url string variable
  • The "run", calls the url with this code (its 2005 so Im stuck with VB):

    Dim myURI As New Uri("http://" + Dts.Variables("URL").Value.ToString())
    Dim myWebClient As New System.Net.WebClient
    myWebClient.OpenReadAsync(myURI)
    

the URL being called is internal and just reads the parameters and performs a series of operation which take some time, that's why I used "OpenReadAsync"

My problem is: if I have 4 URLs to run, the package runs only 2 of them. The loop lops 4 times, the script is called 4 times (I can see if I debug it), the line myWebClient.OpenReadAsync(myURI) is executed 4 times with 4 different values, but only 2 calls to the URL are made.

If I run the package again, the other 2 URLs are now called, which proofs that there isn't anything wrong with the URL and If I call the 4 urls manually on the browser (on 4 tabs for example) one right after another, them all produce the expected result, which proofs that there is nothing wrong with the code that parses the URL.

So I'm left with the VB code, its the first time Im using uri and WebClient so I wonder if Im doing something wrong. I also tried to add a 5 seconds sleep between the calls, but no luck.

Any help would be appreciated. Thanks

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Diego
  • 34,802
  • 21
  • 91
  • 134
  • What if you switch over to using the synchronous OpenRead method? – billinkc May 23 '12 at 15:40
  • Hi billinkc! I get a timeout after the second run. Its strange because the 4 urls I have should run in a few seconds, in fact I can see (I have a log) that the second one ran 5 seconds after the first one. And if I run the package the second time, the 2 remaining urls are ran fine, so definitively its something with the fact of calling the code more than 2 times – Diego May 23 '12 at 15:53
  • Whenever I run into "weird" code issues in SSIS, I dump the code out to a .NET console app and see if I can reproduce the behaviour there. I assume you've already tried that but in case you haven't, that might be a place to turn seeing as there hasn't been much love for your bounty. Also, what's your full code look like? Any chance that all 4 URLs are being called but since they're async calls, you just don't observe the effects until later? What if you put a longer thread.sleep in there, something to match the expected process duration? Defeats purpose of async, I know but may shed some light – billinkc Jun 05 '12 at 02:55
  • hi bilinkc! That on the question is my full code! I'm positive the code is not being called because there is a DB interaction so I can see on the DB. Also I added a longer sleep and it didn't help. Can you give me more details on how you do the dump? I only know how to dump error using DTEXEC on command line. Don't think this is the case. You can add as an answer so I can mark it since you were the one how helped me the most so far. Thanks – Diego Jun 05 '12 at 09:13

2 Answers2

4

All browsers are expected to limit themselves to 2 requests per host, to avoid overloading the host. .NET follows this rule and allows only 2 concurrent connections to a host. You can change this limit either by modifying an application's config file or through code.

The Delay you added to the script didn't work because you didn't call Dispose on the WebClient instance. The WebClient class keeps its connection open until you dispose of it in order to read the response stream. Otherwise you will not be able to connect to the same host again until the garbage collector collects the client.

Besides, OpenReadAsync opens the stream to the client and ensures it remains open unless you close it or it gets collected. You should use one of the DownloadXXXAsync to avoid opening the stream without a reason.

A better solution would be to call DownloadStringAsync and dispose of the client in the DownloadStringAsyncCompleted event.

UPDATE:

ServicePointManager.DefaultConnectionLimit is stored in a static field which means that its scope is the entire AppDomain. SSIS uses a single AppDomain for each package execution so the value will affect the entire package.

If you want to modify the connection limit only for a single host using FindServicePoint, you can create a ServicePoint for the host address and set the limit just for this address:

var myTarget= ServicePointManager.FindServicePoint(new Uri("http://www.google.com"));
myTarget.ConnectionLimit = 10;
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • I added inside but I still get the same behaviour – Diego Jun 05 '12 at 12:29
  • Ignore my last comment. I added the code from your second link to my script task with a limit of 5 and it did work perfectly. What I don't understand is: I didn't make any reference to my WebClient object. How did it "read" this setting? Is it a global setting? Thanks a lot – Diego Jun 05 '12 at 12:42
  • The value is stored in a static field which means it is global to the AppDomain. SSIS uses a single appdomain for each package execution so there is no risk of the change affecting other executions. Updated the answer with code to change the limit for a single address only – Panagiotis Kanavos Jun 05 '12 at 13:08
  • that's amazing, thanks again. Another quick question. Would you know how to call a Uri informing the credentials? I would need to hard code a AD user \ password but I'm unlucky so far – Diego Jun 05 '12 at 13:15
1
  1. Try to extend your timeout for every task and subtask.

  2. I wasn't asked, but I would hard-code a task like this instead of using SSIS. SSIS is perfect for ETL but not much else!

Brandon Arnold
  • 410
  • 1
  • 5
  • 18