1

I need to retrieve 200k records from a SQL Server. The SQL returns all of the data in one operation. Each row represents a customer. One customer has multiple locations associated with it and each location has multiple licenses associated with that location. Performance is the most important factor here and I need to know if my method is the fastest or if there is some faster way of loading all of these SQL records into a C# container. I use a dictionary and I assign a record from the SQL data reader using the corresponding column index. A sample of the code is below. Right now it takes 12 minutes for .NET to process 269,000 rows. Is there a faster way to do this?

EDIT - Thanks for all of the fantastic advice. I will elaborate on the details more because I simplified what i showed you here to give you an idea of how I handle nested lists.

There are about 4,000 agents records and each one must be sent every day to a vendor. Each agent record has a list of addresses, contacts, and locations. In the case of locations, each location can have a ist of addresses for that location, contacts for that location, licenses for that location and people for that location.

The original developer sent a SQL query to get a list of all customer id numbers (record key). Then he does a for loop and sends out an individual sql request with the specific customer id to get the list of addresses, then a different sql request for a list of the contacts, and so forth. Once he adds a location to the main agent list, he then makes 7 separate SQL calls to get all of the lists for that location. This process takes nearly 5 hours at run time. MY boss gave it to me and wants me to get it running in under 15 minutes.

MY first approach is to send a request to the SQL server, get all agents and all child lists at one time then send the entire result set back to the code. This was done to eliminate all of the sql calls. Then I want to load all of this data into the agent records and their nested

lists. Right now I have 100 fields in my select and 26 left joins. and some of those joins connect to the same table but with a different id for a different record. For example, there is a join to the address table that matches the address id to the agent id. Then there is also another join to the address table but for an address in the location list, with a match on the location id to the address id.

So now you know the full story and you have seen my code. I am going to incorporate some of the suggestions you provided and see what kind of improvement I get. Once I have something faster i will post it here so you can see what i did. If you have any further advice, I would love to hear it.

public class LicenseSF
{
    public string LicenseId { get; set; }
}


public class LocationSF
{
    public int LocIdSeq { get; set; }
    public List<LicenseSF> Licenses { get; set; }
}   
 
public class AgentRecord
{
    public int CusIdSeq { get; set; }
    public List<LocationSF> Locations { get; set; } 

}


Dictionary<int, AgentRecord> agencyDict = new Dictionary<int, AgentRecord>();

  using (var con = new SqlConnection(connectionString))
  {
      using (var cmd = new SqlCommand(sql, con))
      {
          con.Open();
          SqlDataReader reader = await cmd.ExecuteReaderAsync();
          if (reader.HasRows)
          {
              while (reader.Read())
              {
                  int cusId = !reader.IsDBNull(0) ? reader.GetInt32(0) : -1;                                              
                  int LocIdSeq  = !reader.IsDBNull(1) ? reader.GetInt32(1) : -1;
                  int LicenseId   = !reader.IsDBNull(2) ? reader.GetInt32(2) : -1;
                  string CUS_Name = !reader.IsDBNull(3) ? reader.GetString(3):string.Empty;   
                 
                  //first, add the new agent if it doesn't already exist in the dictionary
                  AgentRecord agentSF;

                  agencyDict.TryGetValue(cusId, out agentSF);
                  if (agentSF == null)
                  {
                      agentSF = new AgentRecord 
                      {
                          CusIdSeq = cusId
                      }
                      agencyDict[cusId] = agentSF;
                  }

                  //if agent has a location add it to the list for this agent
                  if (!Convert.IsDBNull(LocIdSeq) && LOC_CUS_ID_SEQ_FK > 0)
                  {
                      if (agentSF.Locations == null)
                          agentSF.Locations = new List<LocationSF>();

                      LocationSF locSF = agentSF.Locations.FirstOrDefault(z => z.LocIdSeq== cusId.ToString());

                      if (locSF == null)
                      {
                            locSF = new LocationSF
                            {
                                LocIdSeq= LocIdSeq
                            };
                      }
                      
                      //A location can have multiple licenses. Add them to this specific location

                      if (!Convert.IsDBNull(LicenseId ) && LicenseId > 0)
                      {
                          if(locSF.Licenses == null)
                              locSF.Licenses = new List<LicenseSF>();

                          LicenseSF licSF = agentSF.Licenses.FirstOrDefault(z => z.LicenseId == LocIdSeq.ToString());

                          if (licSF == null)
                          {
                              licSF = new LicenseSF
                              {
                                  LicenseId= LicenseId
                              };
                          }

                      }
                      
                      agentSF.Locations.Add(locSF);

                  }
              }
WorkJ
  • 91
  • 16
  • 1
    At least specify the size of dictionary `var agencyDict = new Dictionary(num_of_customers);` so it is not expanded multiple times. – Guru Stron Mar 27 '23 at 22:19
  • Also how many locations per customer and licenses per location are you expecting? – Guru Stron Mar 27 '23 at 22:22
  • 1
    But in general - you need to profile and see what is taking that much time. If it is reading from database - then not much you can do except for parallelizing the requests. – Guru Stron Mar 27 '23 at 22:25
  • Personally I would try reading everything into a single collectionof intermediate objects and then building the dictionary using several `GroupBy`'s. – Guru Stron Mar 27 '23 at 22:27
  • Is it me or is licSF not actually added anywhere? I suspect the FirstOrDefault stuff is what slowing this code down. Also here, !Convert.IsDBNull(LicenseId ) && LicenseId > 0, isn't licenseid > 0 enough – siggemannen Mar 27 '23 at 22:41
  • Are you taking over the job of SQL server? Why use SQL anyway? It's faster loading the info from a binary file. You'd probably need to build smarter queries or have a better selection of information, instead of downloading the whole database. – Jeroen van Langen Mar 27 '23 at 22:41
  • I have had similar issues with `List.FirstOrDefault`. To speed things up, I have found that a helper `Dictionary` works wonders. It may be counterintuitive that adding a value to a `List` and a `Dictionary` speeds things up, but if your `Dictionary` maps the index in the `List` to the value, then the lookup is immensely faster. Checking a `Dictionary` for the presence of a value is hugely faster than calling `List.FirstOrDefault`. In such cases I use what I term a reverse `Dictionary`: where the key is a value, and the value an index – Jonathan Willcock Mar 27 '23 at 22:48
  • Look into [Dapper](https://github.com/DapperLib/Dapper), particularly the `splitOn`, or [Entity Framework](https://learn.microsoft.com/en-us/ef/) (but perf will be a little less). Another thing is: let the DBMS sort your data (by x, y and z for example) and just skip all kinds of checks and add sequentially in one single go. Try to leverage the DBMS as much as possible; let *it* do the hard work whenever possible. 200K records should be peanuts and be done in less than a few seconds *tops*. – RobIII Mar 27 '23 at 23:38
  • You could use Parallel.For and avoid LinQ or List<> . Also SQLBulkCopy is a good option. I work everything with a datatable, see this answer: https://stackoverflow.com/a/69574353/888472 – Leandro Bardelli Mar 27 '23 at 23:51
  • Also you could replace Dictionary by ConcurrentDictionary – Leandro Bardelli Mar 27 '23 at 23:54
  • 3
    The fastest way is not to load them at all. Why do you need all these objects in memory? – Jeremy Lakeman Mar 28 '23 at 00:04
  • To identify the most consuming part of the process : put into comments the code inside the "while (reader.Read())" from "agencyDict.TryGetValue(cusId, out agentSF);" . You will see the amount of time just used to read the table without storing the data in the dictionary. In my opinion, the problem comes from the SQL reader rather than the dictionary construction, but I may be mistaking. – Graffito Mar 28 '23 at 00:18
  • You can use multiple threads to read the entries into a list where you check with each successive thread if the list already contains the customer entry and then continue – Son of Man Mar 28 '23 at 04:31
  • 2
    There's always a faster way. What should matter to you is making your solution _fast enough for your business need_. – TylerH Mar 28 '23 at 13:54

3 Answers3

2

Long time since I have had to get lots of data moved between C# and SQL server, so it may have changed.

It was normally faster to write the data to a file using one of the data export tools in SQL server then read the file in C#. It could also be faster to export the data from each table to a separate file and then process the "join" outside of SQL server if you don't care about locking of records etc to protect from updates.

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • I haven't thought of this. Do you have any benchmarks for records per minute or per second? – WorkJ Mar 29 '23 at 03:46
0

There are a number of efficiencies you can make.

  • Firstly, convert the lists into dictionaries, and initialize them in the class.
  • Presize your dictionaries if you know how much data you are getting.
  • async code is often slightly slower when you are able to dedicate a thread. It is more responsive, and performs better in scenarios where you are running all of this many times a second though, so depends on use case.
  • Convert.IsDBNull is wrong, it should just compare to -1. Alternatively use a nullable int?.
  • The reader needs using.
  • reader.HasRows is unnecessary.
using var con = new SqlConnection(connectionString);
using var cmd = new SqlCommand(sql, con);
con.Open();
using var reader = cmd.ExecuteReader();
while (reader.Read())
{
    int cusId = !reader.IsDBNull(0) ? reader.GetInt32(0) : -1;                                              
    int LocIdSeq  = !reader.IsDBNull(1) ? reader.GetInt32(1) : -1;
    int LicenseId   = !reader.IsDBNull(2) ? reader.GetInt32(2) : -1;
    string CUS_Name = !reader.IsDBNull(3) ? reader.GetString(3):string.Empty;   
                 
    //first, add the new agent if it doesn't already exist in the dictionary
    if(!agencyDict.TryGetValue(cusId, out var agentSF))
    {
        agentSF = new AgentRecord 
        {
            CusIdSeq = cusId
        }
        agencyDict[cusId] = agentSF;
    }

    //if agent has a location add it to the list for this agent
    if (LocIdSeq != -1 && LOC_CUS_ID_SEQ_FK > 0)
    {
        if(!agentSF.Locations.TryGetValue(cusId, out var locSF)
        {
            locSF = new LocationSF
            {
                LocIdSeq = LocIdSeq
            };
            agentSF.Locations.Add(cusId, locSF);
        }

                      
        //A location can have multiple licenses. Add them to this specific location

        if (LicenseId > 0)
        {
            if(!agentSF.Licenses.TryGetValue(LocIdSeq, out var licSF)
            {
                if (licSF == null)
                {
                    licSF = new LicenseSF
                    {
                        LicenseId= LicenseId
                    };
                }
            }
                      
            agentSF.Locations.Add(locSF);

       }
   }
}

Finally, given that you are clearly using a three-way join, consider three separate select statements, which may be more performant. You can use the dictionary result of the first to feed the second, etc.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
0

This process takes nearly 5 hours at run time. MY boss gave it to me and wants me to get it running in under 15 minutes.

Right now it takes 12 minutes for .NET to process 269,000 rows.

Your job here is already done, because your boss gave you a performance budget (15 minutes) and you're comfortably within that budget.

We can give you faster code if you let us know what your new performance budget is. That's because the answer for a 10-minute budget will be quite different to the answer for a 1-minute budget.

HTTP 410
  • 17,300
  • 12
  • 76
  • 127
  • When I wrote that it took 12 minutes, I wasn't finished with the SQL for all of the nested lists. Today, I had time to add another list and now the SQL query returns 1.4 million records and takes 10 minutes. That time is before the c# code starts loading it into the dictionary and lists. I have 3 more lists before the final sql is complete, so I still need help getting this operation to be as fast possible. – WorkJ Mar 29 '23 at 03:43