2

I am trying to run a query that normally runs perfectly but if the list is too big, it gets a timeout error.

This is my code:

public async Task<IEnumerable<User>> GetUsers(IEnumerable<int> ids)
{
    if (ids.IsNullOrEmpty())
    {
        return Enumerable.Empty<User>();
    }

    string query = $@"  SELECT  *
                        FROM    dbo.Users
                        WHERE   Id IN ({string.Join(",", ids)})";

    using (SqlConnection conn = new SqlConnection(ConnectionString))
    {
        return await conn.QueryAsync<TEntity>(query);
    }
}

If the ids count is small (less then about 100K) it works fine. But if the count is about 1 Million it causes to a SQL Timeout exception.

When I look at the SQL profiler, I see that the query even didn't run.

What am I doing wrong? What would be a good solution or a good way to deal with huge amounts of data in C# => SQL Server queries?

Thanks in advance.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
Misha Zaslavsky
  • 8,414
  • 11
  • 70
  • 116
  • Parametrise your SQL properly and this problem solves itself. I suggest looking at table-value parameters – Thom A Apr 11 '19 at 12:05
  • 6
    One million values means a table. Whether that's a table-valued parameter, or a table you bulk copy values into (with either possibly an in-memory table) is your choice. `IN` won't take that many values. Note that you should really question *why* your application ever has a need to pass in 1M values -- who is going to meaningfully query for 1M separate users? Wouldn't you just get all of them in that case and have no need for passing values? – Jeroen Mostert Apr 11 '19 at 12:05
  • 2
    Delays mean missing indexes. If you pass 100K IDs in an IN clause, much less 1M, there's something wrong with the logic. *Why* pass so much data there in the first place? – Panagiotis Kanavos Apr 11 '19 at 12:07
  • And yes, 1M or 100K means a table with indexes so SQL Server can optimize the operation. With 100K and no indexes, SQL Server will have to seek once for each value. No matter how `Users` is indexed, 1M items in the `IN` clause means 1M index seeks. To get a *good* execution plan, and assuming `Users.Id` is a primary key, SQL Server would have to store those IDs in a temporary table, add an index on it and only then join the temp table with `Users`. Even so, retrieving *1 million user records* will be slow – Panagiotis Kanavos Apr 11 '19 at 12:08
  • Does those 100K Id have a connection? form 1 to 100_001? Are you geting the whole table by brunch of 100k? there is Something missing, I can'tfind anything close to an usecase that looked like that? How did you get those ID in the fifrst place?Why don't you Simply filter the table using the rules you use to get those? – xdtTransform Apr 11 '19 at 12:58
  • And how mutch data are you getting in return ? – xdtTransform Apr 11 '19 at 12:59
  • The DB normally have limits to insert values into IN clause. There are some workarrounds in the following answer: [DB IN clause workarround](https://stackoverflow.com/questions/1069415/limit-on-the-where-col-in-condition) – Guilherme Alencar Apr 11 '19 at 12:08
  • This isn't about database limits, it's about faulty logic. Even if one could pass 1M items, the query would perform very badly – Panagiotis Kanavos Apr 11 '19 at 12:09
  • 1
    Maybe your database didn't reach the limit yet, but it has limits. But I got your problem, I suggest you try to store this Ids temporally in a table and then do a JOIN with the user table because joins has more performance than IN clause or subselects – Guilherme Alencar Apr 11 '19 at 12:18
  • can't you change the logic of your app to make this statement combine a subselect for id's ? – Carl Verret Apr 11 '19 at 12:20

2 Answers2

3

I see multiple issues here and I try to take a step back here:

First of all, as others mentioned, the way your passing the parameters is suboptimal. IN isn't made to be passed such long lists of parameters. In those cases IN uses an enormous amount of resources which most probably lead to your discovered issue.

The technical solution for that would be to create a temporary table or use and join the user table. As pointed out in the comments, this is not the same thing and FROM (VALUES()) can work in certain cases but not in others.

With a temporary table this would look like

    string query = $@"  CREATE TABLE @ids (id INT);

                        INSERT INTO @ids VALUES {string.Join(",", ids.Select(id => $"({id})"))}

                        SELECT  u.*
                        FROM    @ids Ids
                        JOIN    dbo.Users u ON u.Id = Ids.Id

It could also be possible use FROM (VALUES(...)), but as my commenter below mentioned this is not guarantueed to work. It would look something like this though:

    string query = $@"  SELECT  u.*
                        FROM    (
                                    VALUES {string.Join(",", ids.Select(id => $"({id})"))} 
                                ) AS Ids(Id)
                        JOIN    dbo.Users u ON u.Id = Ids.Id";

The second thing I notice is that this query always loads the complete user object. I assume the calling method then does some operation on those users which just a subset of the users data? In that case it would be cleaner to just return the data required in that specific step of the application and do a limited select. This reduces the amount of data that needs to be read from the disc and transferred to the application drastically and therefore can improve performance massivele

crazy_crank
  • 659
  • 4
  • 17
1

One approach could be to split your ids colletion to bits, let's say 100K (since it's working with that amount of IDs) and query the database multiple times.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69