3

I have a program which loops through an apps list.

Apps
--------
App1
App2
App3

Now, for each of them, I do a http request to get a list of builds for each app as an Xml.

So a request like,

http://example.com/getapplist.do?appid=App1

gives me a response like,

<appid name="App1">
  <buildid BldName="Bld3" Status="Not Ready"></buildid> 
  <buildid BldName="Bld2" Status="Ready"></buildid>
  <buildid BldName="Bld1" Status="Ready"></buildid>
</appid>

Now I get the Highest build number with Status "Ready" and then do another web api call like,

http://example.com/getapplist.do?appid=App1&bldid=Bld2

This gives me a response like,

 <buildinfo appid="App1" buildid="Bld2" value="someinfo"></build>

I feed these into internal data tables. But now, this program takes a painfully long time to complete (3 hours), since I have close to 2000 appids and there are 2 Web requests for each id. I tried sorting this issue using a BackgroundWorker as specified here. I thought of collating all info from http responses into a single XML and then using that XML for further processing. This throws the error,

file being used by another process

So my code looks like,

if (!backgroundWorker1.IsBusy) 
{
    for(int i = 0; i < appList.Count; i++)
    { 
        BackgroundWorker bgw = new BackgroundWorker();
        bgw.WorkerReportsProgress = true;  
        bgw.WorkerSupportsCancellation = true;                     
        bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);                   
        bgw.ProgressChanged += new ProgressChangedEventHandler(bgw_ProgressChanged);
        bgw.RunWorkerCompleted += new RunWorkerCompletedEventHandler(bgw_RunWorkerCompleted);
        //Start The Worker 
        bgw.RunWorkerAsync();
    }
}

And the DoWork function picks the tag values and puts it into an XML.

What is the best way I can get the app- buildinfo details into a common file from all the http responses from all the background workers?

Sam
  • 7,252
  • 16
  • 46
  • 65
mhn
  • 2,660
  • 5
  • 31
  • 51
  • 1
    This generates 2000 backgroundworkers...not good...instead call your webapi async from one backgroundworker...and use a lock when one of the async webrequest completed events fires and writes to the xml file. – rene Aug 30 '14 at 10:48
  • Could you please add your `DoWork` method code? – Yuval Itzchakov Aug 30 '14 at 11:07
  • @rene , So what is the limit I have to set it to? Also, If I set a limit of say 5 Background workers, does that mean the threads would run in parallel until all 2K urls are worked on? – mhn Aug 30 '14 at 11:11
  • @YuvalItzchakov . In my DoWork, currently, I have some code to build a string from the XML response i get and then do a simple System.IO.File.WriteAllText(filename,stringvalue) – mhn Aug 30 '14 at 11:16
  • 1
    try [Parallel.For](http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.for(v=vs.110).aspx) or [Parallel.ForEach](http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.foreach(v=vs.110).aspx) which is much simpler, and don't have to worry about the performance degradation with too many threads. – bansi Aug 30 '14 at 11:17
  • @bansi is it .net 4.0 compatible? – mhn Aug 30 '14 at 11:19
  • @bansi No reason to use `Parallel.ForEach` with IO bound work. – Yuval Itzchakov Aug 30 '14 at 11:45

2 Answers2

4

HTTP requests are IO bound and asynchronous by nature, there is no reason to use background workers to accomplish what you need.

You can take advantage of async-await which is compatible in .NET 4 via Microsoft.Bcl.Async and HttpClient:

private async Task ProcessAppsAsync(List<string> appList)
{
    var httpClient = new HttpClient();

    // This will execute your IO requests concurrently,
    // no need for extra threads.
    var appListTasks = appList.Select(app => httpClient.GetAsync(app.Url)).ToList();

    // Wait asynchronously for all of them to finish
    await Task.WhenAll(appListTasks);

   // process each Task.Result and aggregate them to an xml
    using (var streamWriter = new StreamWriter(@"PathToFile")
    {
        foreach (var appList in appListTasks)
        {
           await streamWriter.WriteAsync(appList.Result);
        }
    }
}

This way, you process all requests concurrently and handle results from all of them once they've completed.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
  • @rene The OP can write synchronously if needed after `Task.WhenAll`, although i see no reason for him to do that. – Yuval Itzchakov Aug 30 '14 at 12:41
  • @mhn I added code writing asynchronously to a file using `StreamWriter`, if needed. – Yuval Itzchakov Aug 30 '14 at 13:17
  • Inspite of adding Microsoft.Bcl.Async and HttpClient, as you suggested, i get the "Type or namespace async could not be found " error . any pointers? – mhn Aug 31 '14 at 09:23
  • happened to stumble upon http://stackoverflow.com/questions/19421878/how-can-i-use-the-async-keywords-in-a-project-targetting-net-4-0/19421907#19421907 . But i do not have VS 2012 installed :( – mhn Aug 31 '14 at 09:46
  • You can download VS2012 express, which is free. – Yuval Itzchakov Aug 31 '14 at 10:01
  • I can try this on my local machine. But the working code would be a step in an SSIS Script component on another Server Machine and I do not have permissions to install there. So is there a work around? – mhn Aug 31 '14 at 13:21
0

This solution works for .Net 2.0 and up by using the async methods from the WebClient class and using a counter that is decremented with the Interlocked class and ordinary lock to serialize the writing of the results to the file.

var writer = XmlWriter.Create(
    new FileStream("api.xml",
                    FileMode.Create));
writer.WriteStartElement("apps"); // root element in the xml
// lock for one write
object writeLock = new object(); 
// this many calls            
int counter = appList.Count;

foreach (var app in appList)
{
    var wc = new WebClient();

    var url = String.Format(
        "http://example.com/getapplist.do?appid={0}&bldid=Bld2", 
        app);
    wc.DownloadDataCompleted += (o, args) =>
        {
            try
            {
                var xd = new XmlDocument();
                xd.LoadXml(Encoding.UTF8.GetString(args.Result));
                lock (writeLock)
                {
                    xd.WriteContentTo(writer);
                }
            }
            finally
            {
                // count down our counter in a thread safe manner
                if (Interlocked.Decrement(ref counter) == 0)
                {
                    // this was the last one, close nicely
                    writer.WriteEndElement();
                    writer.Close();
                    ((IDisposable) writer).Dispose();
                }
            }
        };
    wc.DownloadDataAsync(
        new Uri(url));   
}
rene
  • 41,474
  • 78
  • 114
  • 152
  • Do you really see a benefit in making so many synchronous writes to a file instead of aggregating the results and writing once, leaving out lock contention at all? – Yuval Itzchakov Aug 30 '14 at 13:30
  • I'm not sure for this case but if the results are large enough memory might be an issue. Or in case of failure and it is expensive to re-run you would have intermediate results (but that would require restart-logic). I'm more concerned about the possibility of having that many network connections open. – rene Aug 30 '14 at 13:52
  • He can always throttle requests if needed. He can also process them as they finish using `Task.WhenAny`. – Yuval Itzchakov Aug 30 '14 at 16:11