Optimize LINQ query that runs fast in Sql server?

Question

I want to calculate the rows of a related table:

MainTable tbl = tblInfo(id);
var count = tbl.Related_Huge_Table_Data.Count();

The problem is: this takes too long (about 20 seconds) to execute, although when I run this query in Sql Server it executes below one second. How can I optimize this query in linq? I also tried to use stored procedure but no luck.

This is the tblInfo method:

public MainTable tblInfo(int id)
{
    MyDataContext context = new MyDataContext();
    MainTable mt = (from c in context.MainTables
                    where c.Id == id
                    select c).SingleOrDefault();
    return mt;
}

I used LinqToSql and classes was generated by LinqToSql.

@GSerg It defined as `EntitySet` in my linqtosql like this: `public EntitySet Related_Huge_Table_Data` — , Feb 12 '17 at 11:17
You [don't want to use `EntitySet`](http://stackoverflow.com/q/5173389/11683) if you want server-side operations. — GSerg, Feb 12 '17 at 11:21
You should ditch `EntitySet`. By design it loads all data in the memory before processing. — GSerg, Feb 12 '17 at 11:23
By deleting references to it from your code and using something else like Entity Framework or Linq2Sql. — GSerg, Feb 12 '17 at 11:26
@GSerg As I said I used Linq2Sql already. This Entityset is generated by Linq2Sql. So how should I delete the reference to it from this line? `var count = tbl.Related_Huge_Table_Data.Count();` — , Feb 12 '17 at 11:28
By revising your `tblInfo` function. Normally with Linq2Sql you create an instance of the context where `ctx.TableName` is an `IQueryable` and `ctx.TableName.Count` is performed on the server. — GSerg, Feb 12 '17 at 11:32
@GSerg I appreciate if you can show me a complete solution by providing an answer and tell me how can I change my query and by considering that the `tblInfo` and `Related_Huge_Table_Data` are generating by Linq2Sql. — , Feb 12 '17 at 11:34
I've never used Linq-to-Sql but with EF I would select only the Id. Something like: .select(x => x.Id).Count() — , Feb 12 '17 at 12:08
I don't really understand your setup. What is `tblInfo` and where does it come from? — GSerg, Feb 12 '17 at 12:29
@GSerg tblInfo is a method that is retrieving a firstordefault of `MainTable`. — , Feb 12 '17 at 13:44
Have it return a query instead, not the result. Remove `.FirstOrDefault` before returning from `tblInfo` (change the return type accordingly). — GSerg, Feb 12 '17 at 13:48
Create a context outside of `tblInfo` (or make `tblInfo` a member function in `MyDataContext`), remove `.SingleOrDefault()` as I said, and use `var` instead of `MainTable`. — GSerg, Feb 12 '17 at 13:58
@GSerg But if I remove `SingleOrDefault` then the `tblInfo` will return `IQueryable` then this line `var count = tbl.Related_Huge_Table_Data.Count();` will be meaningless because tbl doesn't have `.Related_Huge_Table_Data` since it is `IQueryable` and not `MainTable`. Well?? — , Feb 12 '17 at 14:02
As @GSerg asnwered the best way is to start from the Child collection and filter by parent.Also,add an index of ParentId (If you don't have one). If you still have problems that problem is somewhere else. Try to see the generated SQL using this *context.Database.Log= Console.WriteLine;* — George Vovos, Feb 14 '17 at 11:53
@user5032790 Please show the SQL query you are comparing this Linq to Entities statement to... without it your question is a bit nonsensical.... It is most likely slower because you return all of the data causing variables to be created and populated and then you perform a count... compare that to simply performing a count in the WHERE in SQL L2E will almost always be slower! — Paul Zahra, Feb 21 '17 at 10:14

score 9 · Accepted Answer · answered Feb 12 '17 at 14:55

By running SingleOrDefault() you execute the query and have to deal with results in memory after that. You need to stay with IQueryable until your query is fully constructed.

The easiest way to answer "how many child records this parent record has" is to approach it from the child side:

using (var dx = new MyDataContext())
{
    // If you have an association between the tables defined in the context
    int count = dx.Related_Huge_Table_Datas.Where(t => t.MainTable.id == 42).Count();

    // If you don't
    int count = dx.Related_Huge_Table_Datas.Where(t => t.parent_id == 42).Count();
}

If you insist on the parent side approach, you can do that too:

using (var dx = new MyDataContext())
{
    int count = dx.MainTables.Where(t => t.id == 42).SelectMany(t => t.Related_Huge_Table_Datas).Count();
}

If you want to keep a part of this query in a function like tblInfo, you can, but you can't instantiate MyDataContext from inside such function, otherwise you will get an exception when trying to use the query with another instance of MyDataContext. So either pass MyDataContext to tblInfo or make tblInfo a member of partial class MyDataContext:

public static IQueryable<MainTable> tblInfo(MyDataContext dx, int id)
{
    return dx.MainTables.Where(t => t.id == id);
}

...

using (var dx = new MyDataContext())
{
    int count = tblInfo(dx, 42).SelectMany(t => t.Related_Huge_Table_Datas).Count();
}

Thanks. I tried all of this ways but no luck. It takes too long still (about 20 seconds) — , Feb 14 '17 at 09:36
@user5032790 Start a trace in the profiler and see what query hits the server when you execute `Count()`. Mine looks very reasonable. — GSerg, Feb 14 '17 at 12:29

score 2 · Answer 2 · answered Feb 21 '17 at 07:27

Try this

MyDataContext context = new MyDataContext();
var count=context.Related_Huge_Table_Data.where(o=>o.Parentid==id).Count();

//or

int count=context.Database.SqlQuery<int>("select count(1) from Related_Huge_Table_Data where Parentid="+id).FirstOrDefault();

score 1 · Answer 3 · answered Feb 20 '17 at 09:06

If you wish to take full advantage of your SQL Database's performance, it may make sense to query it directly rather than use Linq. Should be reasonably more performent :)

 var Related_Huge_Table_Data = "TABLENAME";//Input table name here
 var Id = "ID"; //Input Id name here     
 var connectionString = "user id=USERNAME; password=PASSWORD server=SERVERNAME; Trusted_Connection=YESORNO; database=DATABASE; connection timeout=30";

 SqlCommand sCommand = new SqlCommand();
 sCommand.Connection = new SqlConnection(connectionString);
 sCommand.CommandType = CommandType.Text;
 sCommand.CommandText = $"COUNT(*) FROM {Related_Huge_Table_Name} WHERE Id={ID}";
 sCommand.Connection.Open();

 SqlDataReader reader = sCommand.ExecuteReader();
 var count = 0;
 if (reader.HasRows)
 {
     reader.Read();
     count = reader.GetInt32(0);
 }
 else
 {
      Debug.WriteLine("Related_Huge_Table_Data: No Rows returned in Query.");
 }
 sCommand.Connection.Close();

score 0 · Answer 4 · answered Feb 14 '17 at 12:20

0

Try this:

MyDataContext context = new MyDataContext();
var count = context.MainTables.GroupBy(x => x.ID).Distict().Count();

answered Feb 14 '17 at 12:20

Inside Man

4,194
12
59
119

score 0 · Answer 5 · edited Feb 15 '17 at 10:34

0

The answer of GSerg is the correct one in many case. But when your table starts to be really huge, even a Count(1) directly in SQL Server is slow.

The best way you can get round this is to query the database stats directly, which is impossible with Linq (or I don't know of).

The best thing you can do is to create a static sub (C#) on your tables definition witch will return the result of the following query:

SELECT 
    SUM(st.row_count)
FROM 
    sys.dm_db_partition_stats st
WHERE 
    object_name(object_id) = '{TableName}' 
    AND (index_id < 2)

where {TableName} is the database name of your table.

Beware it's an answer only for the case of counting all records in a table!

edited Feb 15 '17 at 10:34

marc_s

732,580
175
1,330
1,459

answered Feb 14 '17 at 17:01

Lostblue

129
3

3

If you have an index on a *small*, non-nullable column (e.g. on an `INT NOT NULL`), then a `COUNT(*)` is still quite fast even for very large tables ..... – marc_s Feb 15 '17 at 10:35
@marc_s can you explain (or point me to some link) how index on small non nullble column increases perfomance of arbitrary count queries? – Evk Feb 16 '17 at 18:48
2

@Evk: if you have a small (narrow) index that's non-nullable, then this index will contain exactly as many entries as the actual data, too - but since it's typically a lot smaller than the entire data record (e.g. only 4 bytes vs. several hundreds or thousands of byte per record), scanning this index will take much less data pages to load, and thus a lot less time to complete – marc_s Feb 16 '17 at 19:41

score 0 · Answer 6 · edited May 23 '17 at 10:29

Is your linq2sql returning the recordset and then doing the .Count() locally, or is it sending SQL to the server to do the count on the server? There will be a big difference in performance there.

Also, have you inspected the SQL that's being generated when you execute the query? From memory, Linq2Sql allows you to inspect SQL (maybe by setting up a logger on your class?). In Entity Framework, you can see it when debugging and inspecting the IQueryable<> object, not sure if there's an equivalent in Linq2Sql.

Way to view SQL executed by LINQ in Visual Studio?

Alternatively, use the SQL Server Profiler (if available), or somehow see what's being executed.

score 0 · Answer 7 · answered Feb 20 '17 at 13:24

0

You may try following:-

  var c = from rt in context.Related_Huge_Table_Data
    join t in context.MainTables
    on rt.MainTableId ==t.id where t.id=id
    select new {rt.id};

   var count=c.Distict().Count();

answered Feb 20 '17 at 13:24

jitender

10,238
1
18
44

Optimize LINQ query that runs fast in Sql server?

7 Answers7