1

I want to have a large number of class instances return the same similar fields of data, like in this example implementation:

foreach (SomeClass sc in SomeClasses)
{
    System.Console.WriteLine(sc.GetData("1st field"));
    System.Console.WriteLine(sc.GetData("Another field"));
    System.Console.WriteLine(sc.GetData("and another"));
}

// ---- inside SomeClass:

Dictionary<string, string> myData;

public string GetData(string field)
{
    return myData[field];
}

What I don't like is the string hashing, lookup and matching that has to happen over and over again in the example (I assume thats how Dictionary works). I would really like to find a better approach.

Coming from the C world, I thought of assigning all fields a unique integer key, such that I can change into an array lookup:

// ---- inside SomeClass:

string[] MyData;

public string GetData(int field_key)
{
    return MyData[field_key];
}

Now the field lookup is efficient, but it just doesn't feel right in these "arrays are evil" times, and it is tedious and error prone to deal with the field_key integer.

I don't know if I'm chasing performance ghosts here, its just that I want to find a design that is both efficient and clean.

Suggestions?

Larsp
  • 65
  • 7

4 Answers4

4

Why don't you want a dictionary look-up? A very efficient implementation of a dictionary would be an index look up of the hash in an array. So the underlying implementation could boil down to the code in your second example. This would make it O(1)

Use the Dictionary

Andrew T Finnell
  • 13,417
  • 3
  • 33
  • 49
  • Yeah, I could let the dictionary look up the integer key. I suppose that would be a solution. – Larsp Jul 09 '11 at 21:34
  • Question: The fields are also represented by a class. Is it a good idea / possible to let the dictionary look up objects of this class? – Larsp Jul 09 '11 at 21:36
  • @Larsp Here is the issue. The rule of thumb is to never pre-optimize. Run the code through a Profiler, there are even some free ones. You ARE chasing ghosts until you profile the application. Dictionaries are crazy efficient. – Andrew T Finnell Jul 09 '11 at 21:36
  • 1
    @Andrew, but computing the hash depends on the size of the string, so it's not O(1). – svick Jul 09 '11 at 21:39
  • @svick That isn't how Big-O works. That doesn't get counted. We aren't talking performance we are talking complexity. Not to mention the fact that static strings can have a pre-computed hash, and the hash of a string only has to happen once since strings are immutable. – Andrew T Finnell Jul 09 '11 at 21:42
  • I know not to pre-optimize. Its just that I'm in the design phase, and would like to select the best data structures available from the toolbox. – Larsp Jul 09 '11 at 21:44
  • 1
    @Andrew, that's exactly how computing complexity using Big-O works. You can't say something is O(1) if the time it takes to compute depends on something that is potentially unbounded. And it would be possible to cache the hash, but `string.GetHashCode()` isn't implemented that way. – svick Jul 09 '11 at 21:52
  • @Andrew: A dictionary lookup on a string key isn't `O(N)` (the number of items in the dictionary), but it is `O(k)` (the length of the string). @svick is correct. – Ben Voigt Jul 09 '11 at 23:05
  • Here is another fellow Stackover Flow person that seems to agree with me. http://stackoverflow.com/questions/2456186/dictionary-lookup-o1-vs-linq-where Yes it will not ever truely be O(1) because of clashes with the lookup. A properly implemented dictionary can approach O(1) not O(k). If you have an article that rebukes this, please share. – Andrew T Finnell Jul 09 '11 at 23:07
  • @Andrew, I don't know what you still don't understand. The dictionary has to call `GetHashCode()` before doing the lookup. The current MS implementation of `string.GetHashCode()` is O(k), where k is the length of the string. The lookup itself is O(1) on average. So, the total complexity is O(k + 1) = O(k). – svick Jul 09 '11 at 23:11
  • @svick So we are arguing over two different things. I am able to abuse your argument because I said " A properly implemented dictionary". The fact the hashes are not cached on the Strings makes you technically correct but it's an abuse of Big-O. If we were to put a constraint on the String length not the items, such as max length = 100, then it's be O(100) which is O(1). I never mentioned specially .NET's implementation of String. I wrongfully assumed it would cache the hash, if it does not you are right. If it was implemented to cache I am right. This should go somewhere in the Wiki. – Andrew T Finnell Jul 09 '11 at 23:23
  • Not to continue this anymore, but the reason it's an abuse is because a .NET String is not "potentially unbounded". It's bounded by a 32-bit integer. So.. In Big-O that makes it a O(1) hash. It seems silly but it's just to compare the complexity of one algorithm with another. This all has nothing to do with the original question, but it was a fantastic discussion in any case. – Andrew T Finnell Jul 09 '11 at 23:46
  • Thanks for the interesting discussion. I take note that the string hashes are not cached, so the dictionary will have to compute the hash, look up and check equality for every field, for every iteration of the foreach. The number of fields may be significant, but is bounded, so I'm at O(n). The delay because of the dictionary will most likely be imperceptible - it just troubles me to do all this unnecessary work, coming from battery economizing embedded development... – Larsp Jul 10 '11 at 09:36
2

If your instances have the same fields, why not just use properties?

foreach (SomeClass sc in SomeClasses)
{
    System.Console.WriteLine(sc.FirstField);
    System.Console.WriteLine(sc.AnotherField);
    System.Console.WriteLine(sc.AndAnother);
}
hammar
  • 138,522
  • 17
  • 304
  • 385
2

Because the fields are not known at compile time, but rather dynamic and user configurable, I'm going to modify your example program slightly to use an array of properties. Then I'd advocate an approach similar to yours but using your own custom class (here, called MyProperty) rather than string. Performance will be at least as good as (and maybe a tad better than) the string approach, but the benefit is that it gives you more flexibility: if you ultimately decide for performance reasons that you need to use an array or List approach, you can easily embed an array index into your MyProperty class. You'd have to change the implementation of GetData but not your calling code.

public static void Test1() {
  SomeClass[] SomeClasses; //created somehow

  //in real life, this would be determined dynamically
  var properties=new[] {SomeClass.FirstField, SomeClass.AnotherField, SomeClass.AndAnother};

  foreach(var sc in SomeClasses) {
    foreach(var property in properties) {
      Console.WriteLine(sc.GetData(property));
    }
  }
}

public class SomeClass {
  public static readonly MyProperty FirstField=new MyProperty();
  public static readonly MyProperty AnotherField=new MyProperty();
  public static readonly MyProperty AndAnother=new MyProperty();

  private readonly Dictionary<MyProperty, string> myData=new Dictionary<MyProperty, string>();

  public string GetData(MyProperty property) {
    return myData[property];
  }
}

//default implementation of Equals and GetHashCode are fine here
public class MyProperty {}

HOWEVER, since your target application is really about collecting a set of dynamic and user configurable property getters, maybe you really want to make some Funcs? Code like the below will be very fast, and it still has the ability you want, namely it allows you to make a little dynamic, user-configurable list of property getters.

public static void Test2() {
  SomeClass[] SomeClasses; //created somehow

  //in real life, this would be determined dynamically
  var getters=new[] {SomeClass.FirstField, SomeClass.AnotherField, SomeClass.AndAnother};
  foreach(var sc in SomeClasses) {
    foreach(var getter in getters) {
      System.Console.WriteLine(getter(sc));
    }
  }
}

public class SomeClass {
  public static readonly Func<SomeClass, string> FirstField=sc => sc.field0;
  public static readonly Func<SomeClass, string> AnotherField=sc => sc.field1;
  public static readonly Func<SomeClass, string> AndAnother=sc => sc.field2;

  private string field0;
  private string field1;
  private string field2;
}
Corey Kosak
  • 2,615
  • 17
  • 13
  • Thanks for reply and code examples. I do have a class representing the fields/properties, and I did wonder if I could use instances of that class for the dictionary lookup without running into trouble. The func solution didn't occur to me, very enlightening, thanks! To me, this is as much about learning C# than solving the actual problem. – Larsp Jul 10 '11 at 09:57
1

First, if you're not sure this actually is a performance problem for you, then yes, you are chasing performance ghosts and your current implementation is fine.

But if you found out during profiling that you really need to make this code faster, then your seems fine. “Arrays are evil” is mostly true only in public interfaces, it's fine to use them for implementation.

One thing I would change about your code though: create an enum containing the fields and use that instead of int. It's just as fast and much more readable. If the fields are not known at compile time, using int is fine. If you do know some of the fields at compile time, you could use static properties for them.

svick
  • 236,525
  • 50
  • 385
  • 514
  • Actually, I did make an enum for the fields that are non dynamic, only to be annoyed by the compiler requiring me to cast the enum members to int before looking up in the array :) – Larsp Jul 09 '11 at 21:40
  • @Larsp: Yep, that is annoying. You can make a static class with a bunch of public constants, but then the compiler won't check for collisions. There's no perfect answer to this. (I guess a code generator could create the constants, without overlap, with `int` type, from a simple list of identifiers, but that makes the build process more complicated) – Ben Voigt Jul 09 '11 at 23:07
  • The static class with public constants is a reasonable alternative, thanks for the tip. – Larsp Jul 10 '11 at 09:11