Well, I'm trying to run a task 100 times on each run (with paralellism) but I can't manage this to work.
I'm trying to bruteforce an API, for this the API allows me to concatenate as many IDS as possible (without exceeding the timeout)
// consts:
// idsNum = 1000
// maxTasks = 100
// We prepare the ids that we will process (in that case 100,000)
var ids = PrepareIds(i * idsNum * maxTasks, idsNum, maxTasks);
// This was my old approach (didn't work)
//var result = await Task.WhenAll(ids.AsParallel().Select(async x => (await client.GetItems(x)).ToArray()));
// This is my new approach (also this didn't worked...)
var items = new List<item[]>();
ids.AsParallel().Select(x => client.GetItems(x).GetAwaiter().GetResult()).ForAll(item =>
{
//Console.WriteLine("processed!");
items.Add(item.ToArray());
});
var result = items.ToArray();
As you can see I put Console.WriteLine("processed!");
statment, in order to check if anything worked... But I can't manage this to work.
Those are my other methods:
private static IEnumerable<ulong[]> PrepareIds(int startingId, int idsNum = 1000, int maxTasks = 100)
{
for (int i = 0; i < maxTasks; i++)
yield return Range((ulong)startingId, (ulong)(startingId + idsNum)).ToArray();
}
And...
public static async Task<IEnumerable<item>> GetItems(this HttpClient client, ulong[] ids, Action notFoundCallback = null)
{
var keys = PrepareDataInKeys(type, ids); // This prepares the data to be sent to the API server
var content = new FormUrlEncodedContent(keys);
content.Headers.ContentType =
new MediaTypeHeaderValue("application/x-www-form-urlencoded") { CharSet = "UTF-8" };
client.DefaultRequestHeaders.ExpectContinue = false;
// We create a post request
var response = await client.PostAsync(new Uri(FILE_URL), content);
string contents = null;
JObject jObject;
try
{
contents = await response.Content.ReadAsStringAsync();
jObject = JsonConvert.DeserializeObject<JObject>(contents);
}
catch (Exception ex)
{
Console.WriteLine(ex);
Console.WriteLine(contents);
return null;
}
// Then we read the items from the parsed JObject
JArray items;
try
{
items = jObject
.GetValue("...")
.ToObject<JObject>()
.GetValue("...").ToObject<JArray>();
}
catch (Exception ex)
{
Console.WriteLine(ex);
return null;
}
int notFoundItems = 0;
int nonxxxItems = 0;
int xxxItems = 0;
var all = items.BuildEnumerable(notFoundCallback, item =>
{
if (item.Result != 1)
++notFoundItems;
else if (item.id != 5)
++nonxxxItems;
else
++xxxItems;
});
CrawledItems += ids.Length;
NotFoundItems += notFoundItems;
NonXXXItems += nonxxxItems;
XXXItems += xxxItems;
return all;
}
private static IEnumerable<item> BuildEnumerable(this JArray items, Action notFoundCallback, Action<item> callback = null)
{
foreach (var item in items)
{
item _item;
try
{
_item = new item(item.ToObject<JObject>());
callback?.Invoke(_item);
}
catch (Exception ex)
{
if (notFoundCallback == null)
Console.WriteLine(ex, Color.Red);
else
notFoundCallback();
continue;
}
yield return _item;
}
}
So as you can see I create 100 parallel post requests using an HttpClient
. But I can't manage it to work.
So the thing that I want to achieve is to retrieve as many items as possible because I need to crawl +2,000,000,000 items.
But any breakpoint is triggered, neither any caption is updated on Console (I'm using Konsole project in order to print values at a fixed position on console), so any advice can be given there?