1

Currently, I'm trying to implement a data reader to perform a particularly large query. The current implementation uses Entity Framework, but due to the nature of the query, it's incredibly slow (somewhere around 4 and a half minutes).

Here is the current implementation with EF:

public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
    var ret = new List<SomeDataModel>();

    using(var context = new SomeDBContext())
    {
        var data = context.SomeEntity.Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue));
        data.ForEach(x => ret.Add(mapper.Map<SomeDataModel>(x))); // mapper is an instance of AutoMapper via dependency injection
    }
    return ret; 
}

Ideally I'd like to generate a more basic query string and pull data through an OracleDataReader. The issue is this: in an IN statement in Oracle, you can only have 1000 values. The SomeValues parameter can be anywhere from 5,000 to 25,000, so I imagine on the back end EF is generating multiple queries on its own, but like I said, it's incredibly slow.

This is sort of the direction I'm trying to take it:

public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
    var ret = new List<SomeDataModel>();
    const int MAX_CHUNK_SIZE = 1000;
    var totalPages = (int)Math.Ceiling((decimal)SomeValues.Count / MAX_CHUNK_SIZE);

    for(var i = 0; i < totalPages; i++)        
    {
            var chunkItems = SomeValues.Skip(i * MAX_CHUNK_SIZE).Take(MAX_CHUNK_SIZE).ToList();
            pageList.Add(chunkItems);
    }

        using (var context = new CASTDbContext())
        {
            var connStr = context.Database.Connection.ConnectionString;
            using (var conn = new OracleConnection(connStr))
            {
                foreach(var page in pageList)
                {
                    var queryStr = string.Format("SELECT * FROM SomeTable WHERE SomeColumn IN ({0})", "(" + string.Join(",", page.ToArray())  + ")");
                    var cmd = new OracleCommand(queryStr, conn);
                    using (var reader = cmd.ExecuteReader())
                    {
                        while(reader.Read())
                        {
                            var newItem = new SomeDataModel();
                            newItem.Something = reader["Something"].ToString();
                            ret.Add(newItem);
                        }
                    }
                }                                      
            }
        }
    return ret; 
}

The desired results I suppose are to either efficiently generate multiple queries for the reader, or construct a single query that can handle this scenario in an effective way. What I have in that second example is sort of a placeholder code at the moment.

Dortimer
  • 619
  • 8
  • 25
  • Why are you using `OracleDataReader`? Why don't you run a stored procedure using EF? – Train Aug 15 '19 at 17:37
  • Unfortunately, there's a sort of superficial requirement to not use a Stored Procedure. The table it references is incredibly volatile, and there have been problems with other stored procedures referencing it, so I was told to not use one. – Dortimer Aug 15 '19 at 17:39
  • If you're splitting the list of items to query into pages anyway, did you try doing that with EF? How's the performance on that? I mean build the `pageList` like in your second example, and then do the `Contains` query for each page? – gnud Aug 15 '19 at 17:43
  • Shame you can't use a proc, as otherwise you could have passed all the values [as an array](https://stackoverflow.com/questions/10217922/how-to-use-array-table-parameter-to-oracle-odp-net-10g-via-ado-net-c) and the query would probably take no time at all. – stuartd Aug 15 '19 at 17:43
  • @gnud I tried going that route as well, but there wasn't really much of a difference in performance. – Dortimer Aug 15 '19 at 17:46
  • What version of EF are you using? Have you tried executing sql directly with EF? Or can you create the SP in code and drop it when you're done? – Train Aug 15 '19 at 17:48
  • I can't help but think the performance problem is in that you're doing each page as a separate query in a loop, not in the actual technique in the link stuartd provided. When you tried it, were you still doing the loop? – madreflection Aug 15 '19 at 17:52
  • @madreflection I've tried both running a single query and multiple queries with less than impressive results. – Dortimer Aug 15 '19 at 17:53
  • You could try to join to a literal table using `sys.odcinumberlist` (https://stackoverflow.com/questions/10353969/how-can-i-select-from-list-of-values-in-oracle) instead of doing IN. I'm not sure about the max number of elements when building that list though. Or, there's always a temporary table. – gnud Aug 15 '19 at 17:57

2 Answers2

2

At first,

Oracle can take more than 1000 values in IN list if we use the pair of column.

So, following will throw error:

SELECT * FROM TABLE 
WHERE COL1 IN (VAL1, VAL2,... VAL1000, VAL1001);

But following will work:

SELECT * FROM TABLE 
WHERE (COL,1) IN ( (VAL1,1), (VAL2,1),... (VAL1000,1), 
(VAL1001,1).....(VAL9999,1) ); 
-- we have used pair of value and 1 to be compared with col and 1

Hope, This will give you direction in solving the problem. And yes, I am not sure about performance so please check it at your end.

Cheers!!

Popeye
  • 35,427
  • 4
  • 10
  • 31
  • Seems like a good way to work around Oracle's limitations. I've only tried it with a data-reader, but performance doesn't quite hold up to EF. – Dortimer Aug 16 '19 at 18:04
0

Something that might help:

Depending on what your SomeEntity and SomeDataModel look like you can be loading values from the DB that you don't need or even tripping lazy loading because in populating a data model it is referencing a related entity that isn't eager loaded. With Automapper I would recommend leveraging the ProjectTo method into the Linq query. This will ensure that the data hit only happens once and returns just the fields needed for the data model.

var results = context.SomeEntity
    .Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue))
    .ProjectTo<SomeDataModel>()
    .ToList();

or with Automapper 9:

var results = mapper.ProjectTo<SomeDataModel>(context.SomeEntity
    .Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue)))
    .ToList();

Querying across a large number of arbitrary values like that is never going to be efficient. You should also be checking whether the data is indexed or not. Once you capture the SQL being run to retrieve your data, feed it through a suitable analyzer (I'm not that familiar with Oracle tooling) to see if there are index suggestions.

Steve Py
  • 26,149
  • 3
  • 25
  • 43