Given this code:
public class Customer
{
public int CustomerID { get; set; }
public string Name { get; set; }
public List<Qualification> Qualifications { get; set; }
}
public class Qualification
{
public QualificationType QualificationType { get; set; }
public decimal Value { get; set; }
}
public class Action
{
public ActionID { get; set; }
public int CustomerID { get; set; }
public decimal ActionValue { get; set; }
}
public class Service : IService
{
public List<Customer> ProcessCustomers()
{
List<Customer> customers = _customerService.GetCustomers(); // 250,000 Customers
List<Action> actions = _actionService.GetActions(); // 6,000
foreach (var action in actions) {
foreach (affectedCustomer in customers.Where(x => x.CustomerID < action.CustomerID)) {
affectedCustomer.Qualifications.Add(new Qualification { QualificationType = QualificationType.Normal, Value = action.ActionValue});
}
foreach (affectedCustomer in customers.Where (x => SpecialRules(x))) {
affectedCustomer.Qualifications.Add(new Qualification { QualificationType = QualificationType.Special, Value = action.ActionValue});
}
}
}
}
The "Most Qualified" Customer may end up with 12,000 Qualifications. On average, customers may end up with ~100 qualifications.
But I get an OOME very early on, after about 50 actions are processed. At that point, my List still only has 250,000 Customers in it, but there has been about 5,000,000 qualifications added throughout the Customers.
Is that a lot? Seems a bit underwhelming to me. I suspected I could have tens of millions of Customers, and each one have an average of 1000 Qualifications, and still be fine. I'm not even close to that.
What can I do, in code, to make this more efficient? I realize I can write the results of each (or bulk-groups) of Actions to a database, but I'd rather do as much in memory as possible before writing the results.
What this does is cycle through the 6,000 Actions and, for each action, adds qualifications for some variable number of Customers. For each action, all customers with a customerID >= the Action-Causing customer will have a Qualification added. So that is ~1.2 Billion added records. Also, for each action, 8-10 customers receive a Qualification. A tiny 60,000 records compared to the 1.2 billion.
I was trying to do this in memory because I don't want to be doing billions of record inserts into a DB. I WILL need this record separation for the next step of processing, which looks at the customer qualifications and the differences in steps of CustomerIDs from top to bottom. Though in the end, I end up putting results (more complex than SUMs) in the database. But I can only achieve those results by looking at steps of differences in individual qualifications, like grading on a curve.