0

I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.

The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.

Simplified code:

foreach (String someValue in someList) {
    String searchCode = OutOfScopeFunction(someValue);
    var results = emoteTable.AsEnumerable()
        .Where(myRow => myRow.Field<String>("code") == searchCode)
        .Take(1);
    if (results.Any()) {
        results.First()["columnname"] = 10;
    }
}

This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.

I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.

Three questions:

  1. Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
  2. Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
  3. Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?

Thank you for your time. If I find an answer I shall post it here as well.

Caesar Kabalan
  • 753
  • 1
  • 8
  • 18
  • Why are you using .Where().Take(1) and then results.Any()? Why not `var result = emoteTable.AsEnumerable().FirstOrDefault(myRow => myRow.Field("code") == searchCode)` ? Then you can just check if result is null.... – ProgrammingLlama Apr 05 '15 at 19:06

2 Answers2

0

Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.

It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.

With deferred execution in mind, you should write something like

var result = emoteTable.AsEnumerable()
    .Where(myRow => myRow.Field<String>("code") == searchCode)
    .FirstOrDefault();

if (result != null) {
    result["columnname"] = 10;
}
Community
  • 1
  • 1
GSerg
  • 76,472
  • 17
  • 159
  • 346
  • Maybe I'm mis-using the `DataTable` and `DataAdapter` constructs then. I thought the purpose of these classes were to allow faster client-side interaction with the data, then push the changes in bulk to the database. My code in question pulls down a table, compares it with local data, as it finds differences it updates the `DataTable`, when finished it pushes changes to the database (via the `DataAdapter`). Is there a better way to accomplish this? – Caesar Kabalan Apr 05 '15 at 18:57
  • I believe the "faster" part comes from the fact that these are lower level constructs and therefore have less overhead, although it might be very small. I can't tell you what way would be better for your particular purpose, you should try a couple and see (e.g. Linq2Sql data context with `SubmitChanges()`, a stored procedure in MySQL, a direct query in a loop). It is fine if you keep your current approach, just make sure you understand the Linq deferred execution concept. – GSerg Apr 05 '15 at 19:07
0

What you have implemented is pretty much join :

var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();

var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);

foreach(var result in results)
{
   result["columnname"] = 10;
}

Join will probably optimize the access to both lists using some kind of lookup.

But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.

Did you try doing raw UPDATE calls? How many items are you expecting to update?

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • I initially did raw update calls and the performance was OK except for the fact I was executing a LARGE number of them. I was under the impression I could copy the table in-memory via DataAdapter/DataTable and make the modifications in the DataTable, then sync it back to the server. Maybe my architecture is flawed. I'm open to ideas where you have a ~10k tow table that you'll need to make thousands of modifications to (updating different rows). – Caesar Kabalan Apr 05 '15 at 19:43