0

Well, I'm trying to run a task 100 times on each run (with paralellism) but I can't manage this to work.

I'm trying to bruteforce an API, for this the API allows me to concatenate as many IDS as possible (without exceeding the timeout)

// consts:
// idsNum = 1000
// maxTasks = 100

// We prepare the ids that we will process (in that case 100,000)
var ids = PrepareIds(i * idsNum * maxTasks, idsNum, maxTasks);

// This was my old approach (didn't work)
//var result = await Task.WhenAll(ids.AsParallel().Select(async x => (await client.GetItems(x)).ToArray()));

// This is my new approach (also this didn't worked...)
var items = new List<item[]>();
ids.AsParallel().Select(x => client.GetItems(x).GetAwaiter().GetResult()).ForAll(item =>
{
    //Console.WriteLine("processed!");
    items.Add(item.ToArray());
});

var result = items.ToArray();

As you can see I put Console.WriteLine("processed!"); statment, in order to check if anything worked... But I can't manage this to work.

Those are my other methods:

    private static IEnumerable<ulong[]> PrepareIds(int startingId, int idsNum = 1000, int maxTasks = 100)
    {
        for (int i = 0; i < maxTasks; i++)
            yield return Range((ulong)startingId, (ulong)(startingId + idsNum)).ToArray();
    }

And...

    public static async Task<IEnumerable<item>> GetItems(this HttpClient client, ulong[] ids, Action notFoundCallback = null)
    {
        var keys = PrepareDataInKeys(type, ids); // This prepares the data to be sent to the API server
        var content = new FormUrlEncodedContent(keys);

        content.Headers.ContentType =
            new MediaTypeHeaderValue("application/x-www-form-urlencoded") { CharSet = "UTF-8" };
        client.DefaultRequestHeaders.ExpectContinue = false;

        // We create a post request
        var response = await client.PostAsync(new Uri(FILE_URL), content);
        string contents = null;
        JObject jObject;
        try
        {
            contents = await response.Content.ReadAsStringAsync();
            jObject = JsonConvert.DeserializeObject<JObject>(contents);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
            Console.WriteLine(contents);
            return null;
        }

        // Then we read the items from the parsed JObject
        JArray items;
        try
        {
            items = jObject
               .GetValue("...")
               .ToObject<JObject>()
               .GetValue("...").ToObject<JArray>();
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
            return null;
        }

        int notFoundItems = 0;
        int nonxxxItems = 0;
        int xxxItems = 0;

        var all = items.BuildEnumerable(notFoundCallback, item =>
        {
            if (item.Result != 1)
                ++notFoundItems;
            else if (item.id != 5)
                ++nonxxxItems;
            else
                ++xxxItems;
        });

        CrawledItems += ids.Length;

        NotFoundItems += notFoundItems;
        NonXXXItems += nonxxxItems;
        XXXItems += xxxItems;

        return all;
    }

    private static IEnumerable<item> BuildEnumerable(this JArray items, Action notFoundCallback, Action<item> callback = null)
    {
        foreach (var item in items)
        {
            item _item;

            try
            {
                _item = new item(item.ToObject<JObject>());
                callback?.Invoke(_item);
            }
            catch (Exception ex)
            {
                if (notFoundCallback == null)
                    Console.WriteLine(ex, Color.Red);
                else
                    notFoundCallback();

                continue;
            }

            yield return _item;
        }
    }

So as you can see I create 100 parallel post requests using an HttpClient. But I can't manage it to work.

So the thing that I want to achieve is to retrieve as many items as possible because I need to crawl +2,000,000,000 items.

But any breakpoint is triggered, neither any caption is updated on Console (I'm using Konsole project in order to print values at a fixed position on console), so any advice can be given there?

z3nth10n
  • 2,341
  • 2
  • 25
  • 49
  • 1
    The short answer is you likely shouldn't be using `Parallel` (you don't need it to run web requests in parallel). https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/how-to-make-multiple-web-requests-in-parallel-by-using-async-and-awaitis where I would suggest starting. – mjwills Aug 11 '20 at 11:56
  • Well, the thing here is that I'm not sure how to achieve this to do anything, for now, the only thing that happens is that 193MB are instantiated on memory and CPU load is increased for a short amount of time, but I don't know what's happening. So any help about how to achieve this using this implementation or any other is apreciated! – z3nth10n Aug 11 '20 at 11:56
  • @mjwills thanks for that, but... I want to use parallel tasks in order to process as many `item`s as possible. – z3nth10n Aug 11 '20 at 11:57
  • 1
    I'm using .NET Framework 4.6.1. – z3nth10n Aug 11 '20 at 11:58
  • 1
    https://stackoverflow.com/a/48785949/34092 will almost certainly be part of your solution. .NET Framework won't allow 100 concurrent requests to a given website by default. – mjwills Aug 11 '20 at 12:03
  • `HttpClient` is still limited by `ServicePointManager.DefaultConnectionLimit` on .NET Framework. I can't really give a "more full" answer. It just is. You need to up the limit. – mjwills Aug 11 '20 at 12:09
  • Well, somebody on whatsapp is telling me to use HttpClientFactory in other to request to the system the maximum available sockets for that moment to the system. Can anybody give me more suggestions? – z3nth10n Aug 11 '20 at 12:12
  • What did you set `ServicePointManager.DefaultConnectionLimit` to? Did https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/how-to-make-multiple-web-requests-in-parallel-by-using-async-and-await help? – mjwills Aug 11 '20 at 12:16
  • I changed it to 1,000, but this is doing the same. So I'll switch to .NET Core (I'll execute it on a Linux system) – z3nth10n Aug 11 '20 at 12:19
  • Did you switch from using `Parallel` to https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/how-to-make-multiple-web-requests-in-parallel-by-using-async-and-await ? – mjwills Aug 11 '20 at 12:43
  • Your code is, sincerely, a mess, because you are trying to combine incompatible tools. Also defining an abstractly named `GetItems` extension method for the `HttpClient` class, containing application-specific logic, is horrible. Anyway, I suggest to forget the `AsParallel`, and look at the TPL Dataflow library. It is the right tool for this job IMHO. You can see a usage example [here](https://stackoverflow.com/questions/60929044/c-sharp-parallel-foreach-memory-usage-keeps-growing/60930992#60930992). – Theodor Zoulias Aug 11 '20 at 14:38

0 Answers0