3

Hi i am spidering the site and reading the contents.I want to keep the request rate reasonable. Up to approx 10 requests per second should probably be ok.Currently it is 5k request per minute and it is causing security issues as this looks to be a bot activity. How to do this? Here is my code

protected void Iterareitems(List<Item> items)
{
    foreach (var item in items)
    {
        GetImagesfromItem(item);

        if (item.HasChildren)
        {
            Iterareitems(item.Children.ToList());
        }
    }
}

protected void GetImagesfromItem(Item childitems)
{
    var document = new HtmlWeb().Load(completeurl);
    var urls = document.DocumentNode.Descendants("img")
                .Select(e => e.GetAttributeValue("src", null))
                .Where(s => !string.IsNullOrEmpty(s)).ToList();
}
RB.
  • 36,301
  • 12
  • 91
  • 131
Rooney
  • 827
  • 7
  • 15
  • 21
  • Do you mean something like this? http://stackoverflow.com/questions/7728569/how-to-limit-method-usage-per-amount-of-time – user2900970 Jan 29 '16 at 09:04
  • 5
    `it is causing security issues as this looks to be a bot activity`... it doesn't *looks to be a bot activity*, it **is** a bot activity :-) – Jcl Jan 29 '16 at 09:08

1 Answers1

4

You need System.Threading.Semaphore, using which you can control the max concurrent threads/tasks. Here is an example:

var maxThreads = 3;
var semaphore = new Semaphore(maxThreads, maxThreads);

for (int i = 0; i < 10; i++)    //10 tasks in total
{
    var j = i;
    Task.Factory.StartNew(() =>
    {
        semaphore.WaitOne();
        Console.WriteLine("start " + j.ToString());
        Thread.Sleep(1000);
        Console.WriteLine("end " + j.ToString());
        semaphore.Release();
    });
}

You can see at most 3 tasks are working, others are pending by semaphore.WaitOne() because the maximum limit reached, and the pending thread will continue if another thread released the semaphore by semaphore.Release().

Cheng Chen
  • 42,509
  • 16
  • 113
  • 174