-1

I have one table with 500,000 records, and within my application I make some filters and the query returns me 410,000, now dont get me wrong, I do not display that data on the screen, its a financial application where every day that table needs to processed and make some calculations in memory and write some totals back to other tables.

I dont want to paste my entire model as it wouldnt fit the purpose of this question, the questions is how can I optimize EF to make the query faster, I tried using SQL Profile and to return those 410,000 it takes 11 minutes.

private List<MasterVenta> FilterMasterVentas(RuleEditor filter, ComisionPorProveedor comisionPorProveedor)
        {
            Periodo periodo = comisionPorProveedor.Periodo;
            var strReferences = new StringBuilder();
            var Modelo_PuntoVentaProveedor = nameof(MasterVenta.PuntoVentaProveedor);
            var Modelo_PuntoVentaProveedor_proveedor = Modelo_PuntoVentaProveedor + "." + nameof(PuntosVentaProveedor.Proveedor);
            var Modelo_PuntoVentaProveedor_Liquidaciones = Modelo_PuntoVentaProveedor + "." + nameof(PuntosVentaProveedor.Liquidaciones);
            var Modelo_PuntoVentaProveedor_Proveedor_TipoDeCanal = Modelo_PuntoVentaProveedor_proveedor + "." + nameof(Proveedor.TipoDeCanal);
            //var Modelo_PuntoVentaProveedor_Proveedor_Productos = Modelo_PuntoVentaProveedor_proveedor + "." + nameof(Proveedor.Productos);
            var Modelo_PuntoVentaProveedor_Proveedor_ContactosProveedor = Modelo_PuntoVentaProveedor_proveedor + "." + nameof(Proveedor.ContactosProveedor);
            var Modelo_PuntoVentaProveedor_Proveedor_Contrato = Modelo_PuntoVentaProveedor_proveedor + "." + nameof(Proveedor.Contrato);
            var Modelo_PuntoVentaProveedor_Proveedor_ComisionesPorProveedor = Modelo_PuntoVentaProveedor_proveedor + "." + nameof(Proveedor.ComisionesPorProveedor);


            strReferences.Append(Modelo_PuntoVentaProveedor);
            strReferences.Append("," + Modelo_PuntoVentaProveedor_Liquidaciones);
            strReferences.Append("," + Modelo_PuntoVentaProveedor_Proveedor_TipoDeCanal);
            //strReferences.Append("," + Modelo_PuntoVentaProveedor_Proveedor_Productos);
            strReferences.Append("," + Modelo_PuntoVentaProveedor_Proveedor_ContactosProveedor);
            strReferences.Append("," + Modelo_PuntoVentaProveedor_Proveedor_Contrato);
            strReferences.Append("," + Modelo_PuntoVentaProveedor_Proveedor_ComisionesPorProveedor);

            //el master de ventas debe filtrase por el periodo, PuntoVenta, TipoDeCanal,Producto

            List<int> puntosVenta = comisionPorProveedor.Proveedor.PuntosVentaProveedor.Select(p => p.PuntoVentaProveedorId).ToList();//tipo de canal de proveedor de acuerdo a la tabla comisionPorProveedor



            if (filter.Rule.IsEmpty()) return null;
            if (!filter.Rule.IsValid()) return null;
            var q = (from c in this.unitOfWork.MasterVentaRepository.Get(null, o => o.OrderBy(x => x.MasterVentaID), strReferences.ToString()).Filter<MasterVenta>(filter.Rule.GetRuleXml())
                     where c.FechaVenta != null &&
                     (
                      (DateTime.Compare(c.FechaVenta.Value, periodo.FechaFinal) <= 0)
                      && (DateTime.Compare(c.FechaVenta.Value, periodo.FechaInicial) >= 0)
                     )
                     &&
                     (c.PuntoVentaProveedor != null &&
                     (
                     (puntosVenta.Any(pv => c.PuntoVentaProveedor != null && pv == c.PuntoVentaProveedor.PuntoVentaProveedorId)
                     || c.PuntoVentaProveedor.PuntoDeVentaHistorials.Any(p => c.PuntoVentaProveedor != null && c.FechaVenta >= p.FechaInicio && c.FechaVenta <= p.FechaFin))


                     ))

                     select c);
            var result = q.ToList();
            return result;


        }

The line that takes 11 minutes is this one

var q = (from c in this.unitOfWork.MasterVentaRepository.Get(null, o => o.OrderBy(x => x.MasterVentaID), strReferences.ToString()).Filter<MasterVenta>(filter.Rule.GetRuleXml())
                     where c.FechaVenta != null &&
                     (
                      (DateTime.Compare(c.FechaVenta.Value, periodo.FechaFinal) <= 0)
                      && (DateTime.Compare(c.FechaVenta.Value, periodo.FechaInicial) >= 0)
                     )
                     &&
                     (c.PuntoVentaProveedor != null &&
                     (
                     (puntosVenta.Any(pv => c.PuntoVentaProveedor != null && pv == c.PuntoVentaProveedor.PuntoVentaProveedorId)
                     || c.PuntoVentaProveedor.PuntoDeVentaHistorials.Any(p => c.PuntoVentaProveedor != null && c.FechaVenta >= p.FechaInicio && c.FechaVenta <= p.FechaFin))


                     ))

the .get method is like this one:

public virtual IEnumerable<TEntity> Get(
        Expression<Func<TEntity, bool>> filter = null,
        Func<IQueryable<TEntity>, IOrderedQueryable<TEntity>> orderBy = null,
        string includeProperties = "")
    {
      IQueryable<TEntity> query = dbSet.AsNoTracking();

      if (filter != null)
      {
        query = query.Where(filter);
      }

      foreach (var includeProperty in includeProperties.Split
          (new[] { ',' }, StringSplitOptions.RemoveEmptyEntries))
      {
        query = query.Include(includeProperty);
      }
Luis Valencia
  • 32,619
  • 93
  • 286
  • 506
  • Don't paste the entire model but please post what indexes you have on the relevant tables. Also - can you get an execution plan for this query? If so please post that. – Abe Miessler Jun 10 '16 at 05:25
  • if im reading your query correctly then the first thing you're doing is ordering it? this it not very good for efficiency – Toxicable Jun 10 '16 at 05:28
  • Check what SQL it's generating. Run it through the query analyser. See what indexes it's hitting. Find which parts are taking the largest amount of time to execute. – Rob Jun 10 '16 at 05:31
  • Is your repository using EF as the underlying framework? if so then using another unit of work layer ontop of that is pointless since EF is already a unit of work. I ask this because that method looks exactly like the one on asp.net, i'd suggest removing it which would clean up your query a bit in terms of readability – Toxicable Jun 10 '16 at 05:34
  • The critical point is that you are not using Entity Framework. You are pulling ALL 500k rows into memory and THEN doing the filtering. – Euphoric Jun 10 '16 at 05:35
  • @Euphoric why? how, what are we doing wrong in the code? can you please point me in the right direction – Luis Valencia Jun 10 '16 at 05:39
  • I addad AsNoTracking and time was reduced to 2 minutes and 32 seconds – Luis Valencia Jun 10 '16 at 05:39
  • If whole table is 500k rows and query returns 400k (almost whole table), indexes seems not very relevant - it is close to full table scan anyway. Most likely that is EF tracking stuff. – Evk Jun 10 '16 at 05:40
  • @EstebanV Is it an option that you write your processing logic using SQL stored procedures ? instead of dragging all these records to memory in order to do some calculations and save them back to SQL ? – Zein Makki Jun 10 '16 at 05:46

4 Answers4

2

Ok, mentioning that you use Code Efects in one of the comments, is extremely critical to your question.

Looking at their webpage, it seems they are able to integrate with Entity Framework to produce proper Where clause. You are just using it incorrectly.

To work correctly, it needs to be run on IQueryable instead of IEnumerable. If you get rid of the Get method and instead run it directly on the repository it should start producing proper SQL database query instead of doing the filtering in memory.

Related quote from their documentation (emphasis mine):

This was a quick example of using the Code Effects' Filter() extension method with LINQ to Object, but the real advantage of Code Effects' Rule-Based Data Filtering is demonstrated by applying filters to Entity Framework queries. The job of the LINQ to Entity provider is to convert all expressions into SQL statements and execute them on the database server. Because the Filter extension holds a proper expression, the provider converts the filter into the proper SQL where clause, runs the resulting statement on the server, and returns the already filtered result set instead of retrieving all data first and then filtering the result set in memory.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
1

Generally the delay is caused by network latency and the time it takes to transfer that amount of data. You want to minimise the amount of data which gets transferred, so a couple of approaches:

  1. Create a lightweight object which maps to the table and only returns the columns from the table you absolutely need.
  2. Can you aggregate any of the data on SQL Server and return the aggregated data for your calculations?
Dave Barker
  • 6,303
  • 2
  • 24
  • 25
1

Entity Framework exposes a number of performance tuning options to help you optimise the performance of your applications. One of these tuning options is .AsNoTracking(). This optimisation allows you to tell Entity Framework not to track the results of a query. This means that Entity Framework performs no additional processing or storage of the entities which are returned by the query. However it also means that you cant update these entities without reattaching them to the tracking graph.

Source HERE

This should increase the performance considerably. You could check some more tweaking from the SQL generated . But then there will always be a point where you will have to go to the DB to optimise it like Sharding, Indexing etc

Community
  • 1
  • 1
Karthik
  • 929
  • 2
  • 12
  • 24
1

The problem is that this is not an LinqToEntityFramework query. This is a LinqToObject query.

Your linq to EntityFramework queries are here.

List<int> puntosVenta = comisionPorProveedor.Proveedor.PuntosVentaProveedor.Select(p => p.PuntoVentaProveedorId).ToList();//tipo de canal de proveedor de acuerdo a la tabla comisionPorProveedor

this.unitOfWork.MasterVentaRepository.Get(null, o => o.OrderBy(x => x.MasterVentaID), strReferences.ToString())

In both cases you are pulling an entire table worth of data into .net.

You cannot optimize the query, because the bottleneck is going to be BANDWIDTH.

This is a common mistake. You think that just because your code is in a Linq query, it is a LinqToEntityFramework query. This is wrong.

  • You will need to completely start from scratch here. You should NOT have any method calls at all in your query (INLINE EVERYTHING).

  • You cannot use any System methods like DateTime.Compare. Replace those with date1 > date2 inequalities (we aren't JAVA developers here).

  • Remember, for everything to work. EF needs to be able to convert it into a SQL query. So there should be NO custom .net types in your query (except your POCOs).

Aron
  • 15,464
  • 3
  • 31
  • 64