Currently, I'm trying to implement a data reader to perform a particularly large query. The current implementation uses Entity Framework, but due to the nature of the query, it's incredibly slow (somewhere around 4 and a half minutes).
Here is the current implementation with EF:
public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
var ret = new List<SomeDataModel>();
using(var context = new SomeDBContext())
{
var data = context.SomeEntity.Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue));
data.ForEach(x => ret.Add(mapper.Map<SomeDataModel>(x))); // mapper is an instance of AutoMapper via dependency injection
}
return ret;
}
Ideally I'd like to generate a more basic query string and pull data through an OracleDataReader. The issue is this: in an IN
statement in Oracle, you can only have 1000 values. The SomeValues
parameter can be anywhere from 5,000 to 25,000, so I imagine on the back end EF is generating multiple queries on its own, but like I said, it's incredibly slow.
This is sort of the direction I'm trying to take it:
public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
var ret = new List<SomeDataModel>();
const int MAX_CHUNK_SIZE = 1000;
var totalPages = (int)Math.Ceiling((decimal)SomeValues.Count / MAX_CHUNK_SIZE);
for(var i = 0; i < totalPages; i++)
{
var chunkItems = SomeValues.Skip(i * MAX_CHUNK_SIZE).Take(MAX_CHUNK_SIZE).ToList();
pageList.Add(chunkItems);
}
using (var context = new CASTDbContext())
{
var connStr = context.Database.Connection.ConnectionString;
using (var conn = new OracleConnection(connStr))
{
foreach(var page in pageList)
{
var queryStr = string.Format("SELECT * FROM SomeTable WHERE SomeColumn IN ({0})", "(" + string.Join(",", page.ToArray()) + ")");
var cmd = new OracleCommand(queryStr, conn);
using (var reader = cmd.ExecuteReader())
{
while(reader.Read())
{
var newItem = new SomeDataModel();
newItem.Something = reader["Something"].ToString();
ret.Add(newItem);
}
}
}
}
}
return ret;
}
The desired results I suppose are to either efficiently generate multiple queries for the reader, or construct a single query that can handle this scenario in an effective way. What I have in that second example is sort of a placeholder code at the moment.