2

[Note: This question has attracted solutions with low runtime performance, so it's worth emphasizing here that performance is key, in addition to the type safety. See the note at the end for more performance info.]

I am using a few different instances of List<int> in a fairly complex algorithm.

Some of these lists contain indexes into each other, i.e. they provide a level of indirection.

I've fixed a few bugs caused by the wrong indexer being used while accessing a list. Because all the lists are the same type, i.e. List<int>, the compiler doesn't provide any type safety at all. For example:

// The below statement is wrong - it should be list1[list2[x]], 
// as x is an index into list2, not list1. 
// list2 returns indexes into list1.
// But the compiler is oblivious to this.
//
var f = list1[x]; 

So, I started thinking that it may be possible to add a degree of type safety by using strongly typed indexes into each list that just wrap an integer:

/// An index into the first list
struct Index1
{
    public int Value { get; set; }
}

/// An index into the second list
struct Index2
{
    public int Value { get; set; }
}

Then, declaring variables of the correct index type will catch a class of bugs at compile time. (This is not foolproof, that's not what I'm after - better is enough.)

Unfortunately a generic List<T> does not offer a way to use a custom index type - the indexer is always an int.

Is there another way to accomplish what I'm trying to do? A custom collection comes to mind - I don't mind the effort, which will likely pay for itself. But I can't think of one that can be used as shown. (I could create a separate collection type for each indexer, of course - but I'd like to use a single new collection type if possible - because otherwise code duplication starts to become an issue.)

Performance:: The algorithm was rewritten to use lists for performance. We're even considering using arrays since there is one less bounds check. So any proposed solution should have excellent runtime performance - comparable at least to List<T>.

Therefore the struct (or any other technique) should ideally be optimized out after the compile-time type-safety checks.

Use Cases: The use cases are:

  1. Accessing a random element from a list via a type safe indexer.
  2. A for loop over a list, also using a type safe indexer.
  3. Using foreach over a list. List optimizes its GetEnumerator() to return a struct and so avoid allocations. I'd like to preserve that.
bright
  • 4,700
  • 1
  • 34
  • 59
  • Do you need to retain ordering, or do you just need to be able to access elements by index ? – driis Jul 31 '16 at 17:34
  • Do you mean ordering within each collection - yes, that is very important. Why? – bright Jul 31 '16 at 17:35
  • Because if not, I'd suggested using a Dictionary. Then I remembered we already have `SortedDictionary` - see answer. – driis Jul 31 '16 at 17:36
  • You can wrap `List` inside your own class `StrictlyIndexedList` which has a strictly-typed indexer and forwards to the inner `List` after converting the strict indexer to an into . This should inline well. – Raymond Chen Jul 31 '16 at 18:01
  • Raymond - would this avoid boxing the structs, and also avoid virtual calls? Would appreciate a bit more detail. For example - what constraints should TIndex have? Note that I'm using structs in the example above to avoid boxing. – bright Jul 31 '16 at 18:09

3 Answers3

1

You should be able to use SortedDictionary<TKey, TValue> to do what you want. Make sure to implement Equals and GetHashCode on your custom key types.

SortedDictionary allows fast and type safe random access to elements while preserving order.

If you require list semantics to avoid rewriting too much code, it would not be hard to create a list-like type with the SortedDictionary as the backing storage.

driis
  • 161,458
  • 45
  • 265
  • 341
  • You are right, there will be a performance penalty. Will it matter ? It depends on your situation, and only you can tell by benchmarking the actual scenario. – driis Jul 31 '16 at 17:42
  • A dictionary look up for each access is not acceptable, I'm afraid. Further there would likely be virtual method access to Equals and GetHashCode, There may even be boxing of the struct in the process. Yes, I stated in the OP that performance is key. That is a parameter of the question. – bright Jul 31 '16 at 17:42
  • Oh, and if TKey is a struct, I think you can do it without the overloads, which should take care of some of your reservations ... Perhaps see http://stackoverflow.com/a/5927853/13627 – driis Jul 31 '16 at 17:43
  • Dictionary lookups are fast ... Not as fast as an index into a list, but in a list you will be doing bounds checking at each access. – driis Jul 31 '16 at 17:44
  • I'm aware of that, but the performance difference is huge, especially with millions of iterations in an inner loop. That's why I mentioned performance specifically. This algorithm has been rewritten with lists just for that reason. We can't regress noticeably on performance - for type safety. The source code for Dictionary - http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,bcd13bb775d408f1 - shows that there are at least two calls, two virtual calls (GetHashCode() and Equals()), and various array accesses inside of FindEntry(). Abysmal, I'm afraid. – bright Jul 31 '16 at 17:50
1

I think the root issue you're running into is that with your structs, there's no way for a collection to take a generic parameter, then get Value without adding some interface on both structs (causing a virtual call), having a Func<TIndex, int> in the collection (like Jacob's answer does), or using a more complex collection that abstracts away the idea of an "index".

The direct approach of implementing a full collection for each Index* doesn't necessarily cause code duplication. A few ways are code generation or IL generation. Here's a design-time T4 template that generates code for each index struct along with a class with an indexer:

<#@ template debug="false" hostspecific="false" language="C#" #>
<#@ assembly name="System.Core" #>
<#@ output extension=".cs" #>
using System.Collections.Generic;

<#
string[] indexTypeNames = { "Index1", "Index2" };

foreach (var name in indexTypeNames)
{
    string collectionName = name + "List";
#>
public struct <#= name #>
{
    public int Value { get; set; }
}

public class <#= collectionName #><T>
{
    private List<T> _inner = new List<T>();

    public T this[<#= name #> index]
    {
        get { return _inner[index.Value]; }
        set { _inner[index.Value] = value; }
    }

    // Other desired IList<T> methods.
}

<#
}
#>

When the T4 template is built, it generates this code file:

using System.Collections.Generic;

public struct Index1
{
    public int Value { get; set; }
}

public class Index1List<T>
{
    private List<T> _inner = new List<T>();

    public T this[Index1 index]
    {
        get { return _inner[index.Value]; }
        set { _inner[index.Value] = value; }
    }

    // Other desired IList<T> methods.
}

public struct Index2
{
    public int Value { get; set; }
}

public class Index2List<T>
{
    private List<T> _inner = new List<T>();

    public T this[Index2 index]
    {
        get { return _inner[index.Value]; }
        set { _inner[index.Value] = value; }
    }

    // Other desired IList<T> methods.
}

Obviously you'll want interface implementations on the collections so they have a natural API, but it's pretty clear how you can add these.

Since the code is generated at design time, the duplication isn't a maintenance issue. However, I would classify code gen and IL gen as workarounds for something that the existing type system can't express, so I would also like to see a better answer using only generics.

Community
  • 1
  • 1
31eee384
  • 2,748
  • 2
  • 18
  • 27
  • Appreciate the answer, as well as the concern for performance and the hope for a more elegant solution :) – bright Jul 31 '16 at 19:09
0

Perhaps you could wrap a list-like collection class with something that provides a generic index type:

public class IndexedCollection<TIndex, TValue> : IList<TValue>
{
    private List<TValue> _innerList;
    private Func<TIndex, int> _getIndex;

    public IndexedCollection(Func<TIndex, int> getIndex)
    {
        this._getIndex = getIndex;
        this._innerList = new List<TValue>();
    }

    new public TValue this[TIndex indexObj]
    {
        get 
        {
            var realIndex = this._getIndex(indexObj);
            return _innerList[realIndex];
        }
        set 
        {
            _innerList[this._getIndex(indexObj)] = value;
        }
    }

    // All the other members
}

...then instantiate like this:

var list1 = new IndexedCollection<Index1, Foo>(i => i.Value);
var list2 = new IndexedCollection<Index2, Foo>(i => i.Value);
Jacob
  • 77,566
  • 24
  • 149
  • 228
  • The function call to getIndex() would not be optimized out, so this won't meet the stated performance requirement. – bright Jul 31 '16 at 17:59
  • I don't think it needs to be optimized out to perform well. It's still way less expensive than doing a hash as with the dictionary. But still, not going to perform as well as direct list access, obviously. – Jacob Jul 31 '16 at 22:53