2

We're currently using LINQ to generate SQL queries, with a bit of magic inside to handle case-specific queries.

Up until now, it's worked fine; very fast, hardly any issues. We've recently run into efficiency issues when querying a large amount of data from the database.

We construct the query as such:

var someIntList = new List<int> { 1,2,3,4,5 };
var query = dtx.Query.Containers.Where(c => c.ContainerID.IsIn(someIntList));

or

var someStringList = new List<int> {"a", "b", "c" };
query = dtx.Query.Containers.Where(c => c.BuildingName.IsIn(someStringList));

Which would generate (along with a bunch of other stuff which isn't related to this):

SELECT * FROM Container WHERE ContainerID IN (1,2,3,4,5)

and

SELECT * FROM Container WHERE BuildingName IN ('a','b','c')

Now in this particular situation, we need to return 50,000 rows .. which is generated through 5 seperate queries, splitting up the load. The DB returns fairly quickly (within seconds), however generating the query takes a long time.

Here's the very last function which is called to generate this particular query:

private static string GetSafeValueForItem(object item)
{
    if (item == null)
        return "NULL";

    if (item is bool)
        return ((bool)item ? "1" : "0");
    if (item is string)
        return string.Format("'{0}'", item.ToString().Replace("'", "''"));
    if (item is IEnumerable)
        return ListToDBList((IEnumerable)item);
    if (item is DateTime)
        return string.Format("'{0}'", ((DateTime)item).ToString("yyyy-MM-dd HH:mm:ss"));

    return item.ToString();
}

private static string ListToDBList(IEnumerable list)
{
    var str = list.Cast<object>().Aggregate("(", (current, item) => current + string.Format("{0},", GetSafeValueForItem(item)));
    str = str.Trim(',');
    str += ")";
    return str;
}

Are there any obvious improvements which can be made to speed up the string concatenation in this case? Refactoring the code and using a different implementation (such as avoiding the query generating and hitting the database directly) is not preferred, but if it offered a big performance boost would be great to hear.

Rob
  • 26,989
  • 16
  • 82
  • 98

4 Answers4

5

Your Aggregate code is basically string concatenation in a loop. Don't do that.

Options:

  1. Use StringBuilder
  2. Use string.Join
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    To explain the reason behind this: string concatenation allocats a new memory buffer for each concatenation, copies BOTH string to that and assigns the new buffer to the variable. Looping over a long list causes massiv memory allocation and copying which is very expensive performancevize. StringBuilder and string.join precalculates the total space required for all elements and copies them only once. – David Mårtensson Jan 20 '11 at 10:06
2

Here's an example using String.Join that outputs the same as your ListToDBList:

String.Format("({0})", String.Join(",", list.Cast<object>().Select(item=>GetSafeValueForItem(item)).ToArray()));

See here for an explanation why concatenating in a loop using + (which is what your call to Aggregate was doing) is slow: http://www.yoda.arachsys.com/csharp/stringbuilder.html

Weeble
  • 17,058
  • 3
  • 60
  • 75
1

I haven't made test cases and profiled your code, so I don't know how much improvement you can expect.

Use a StringBuilder instead of String.Format and the += operator. The += operator is known to be slow. I suspect String.Format is going to be somewhat slow, too.

You could also try string.Join instead of manually joining the array. It works on IEnumerable in newer versions of the .NET framework (4.0?).

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
  • String.Format use StringBuilder internally (http://stackoverflow.com/questions/6785/is-string-format-as-efficient-as-stringbuilder), but since it is used many separate times I agree with your point. – Andreas Ågren Jan 20 '11 at 09:38
  • Multiple string.format will be as slow as += as you repetedly allocates new memry and copies the data. String builder only allocates memory and copies in the end when you do the ToString call. – David Mårtensson Jan 20 '11 at 10:08
0

Not sure why you're doing list.Cast when a plain IEnumerable will be of objects anyway. But your whole ListToDBList can be replaced by

string.Format("({0})", string.Join(",",list.ToArray())); 

Not sure how much quicker it would be, but it's clearer to my mind.

Massif
  • 4,327
  • 23
  • 25
  • 1
    The cast is required because IEnumerable doesn't implement IEnumerable. (It couldn't - it would be a circular inheritance graph.) At least for me on .NET 3.5, I can't apply any Linq extension methods to IEnumerable without first using Cast to convert it into an IEnumerable. – Weeble Jan 20 '11 at 09:54