1

Say we have this pair of structs, part of a widely used interchange format (so we cannot modify the source - and ideally shouldn't want to: we're not trying to alter the data itself):

struct Vector2 { public int x; public int y; }
struct Vector3 { public int x; public int y; public int z; }

And a class whose core is a list of one or the other, and contains many algorithms which are nearly-identical to implement for either struct, but have to reference the extra members in the 3-element struct (the z):

public class Mesh<T>
{
    private List<T> _myVectors;
}

...how do you correctly implement the suite of methods that process them? e.g.:

public int Average()
{
    // if Vector2:
    int sum = 0, count = 0;
    foreach( var v2 in _MyVectors )
    {
        sum += v2.x + v2.y;
        count++;
    }

    // if Vector3:
    int sum = 0, count = 0;
    foreach( var v3 in _MyVectors )
    {
        sum += v3.x + v3.y + v3.z;
        count++;
    }

    return sum/count;
}

Noting especially: struct-specific methods already exist (millions of them), offered by API's that lack any in-built Generics. So, e.g. we can confidently write an algorithm to use a method in FOREIGN_API, knowing that one copy of the source code (but using Generics) will bind to an acceptable implementation either way:

public float FOREIGN_API_Average( Vector2 input );
public float FOREIGN_API_Average( Vector3 input );

The problems I'm trying to wrap my head around here are, approximately:

  1. It is the // if Vector2: conceptual part in the example above that I can't figure out how to do in C#.
  2. I'm simply not sure how to structure this. It feels like I have to do some mildly clever trick around stating "I have arbitrary generics args. But I have some specific versions of this class I will implement internally. (All other non-specific versions are illegal by implication, and I'll implement a set of methods that throw Exceptions)". But... I'm not sure how to pull that off.
  3. Structs cannot "extend" one another. If they were classes I would start by constraining the using-class (Mesh) with a where T : BaseVector, and go from there. But that's not possible with structs.
  4. The structs come from a (billion dollar) piece of software that I don't own; there are plenty of architectural decisions I wish they'd made differently, but TL;DR: got to work with what we've got.
  5. This problem isn't just two structs: to support 3D math, I have to re-implement everything for Vector1, Vector2, Vector3, Vector4... there's a lot of code I don't want to copy/pasta!
  6. ...and the typical Mesh class has 4 internal lists, each of which can be any of those 4 types. If I have to write every combination by hand, and not use Generics, I will have 4^4 = 256 copies of the same code to write and maintain.
Adam
  • 32,900
  • 16
  • 126
  • 153
  • `It is the // if Vector2: conceptual part in the example above that I can't figure out how to do in C#` - that is `if (typeof(T) == typeof(Vector2))`, but usually resorting to this code is a sign something is wrong. In this particular case, where you have to deal with structs specifically, it might not be. So provided that "struct-specific methods already exist" you would have a lot of `if (typeof(T) == typeof(Vector2){FOREIGN_API_Average((Vector2)an_object)})`, which is still code duplication, but of less code. – GSerg Nov 17 '19 at 22:09
  • @GSerg My expectation was that anything to do with runtime type will dead-end, because I won't be able to pass unqualified T objects into other methods (e.g. in foreign API's) that require a compile-time verified type. C# doesn't allow that? – Adam Nov 17 '19 at 22:12
  • I've edited the comment. – GSerg Nov 17 '19 at 22:14
  • The `FOREIGN_API` accepts single vectors as inputs to `Average`? Shouldn't it be a list or an array? – V0ldek Nov 17 '19 at 22:43
  • Are you able to modify the `struct` definitions, or are they set in stone in a third party library? This is not clear to me based on points 3./4., but while structs can't inherit, they can implement common interfaces. – kalimag Nov 17 '19 at 22:54
  • @kalimag good question - set in stone, sadly, or I'd have tried to architect it differently in the first place :) (I've updated the question to clarify this). – Adam Nov 17 '19 at 23:03
  • @V0ldek There's a mixture. For my specific situation, I especially care about the ones that take a whole List. If that makes it easier, great - but I'm also interested in the more general case (I need to support lots of one-instance methods, but I could probably hack some things to workaround those if absolutely needed) – Adam Nov 17 '19 at 23:05
  • @Gserg - I see. But we can't do anything with the methods that want the whole List? (c.f. V0ldek's comment - there's a mix of API methods for: single items, or list-of-items – Adam Nov 17 '19 at 23:46
  • @Adam You cannot cast `List` to `List`, but you can `var casted = _myVectors as List` and pass that. – GSerg Nov 18 '19 at 00:27

4 Answers4

1

Truly makes you envy C++ and their stupid sexy templates, doesn't it?

Some assumptions first (correct them if they are wrong):

You've said that the mesh type can have four different lists, so I'll assume its signature is Mesh<T1, T2, T3, T4>. I'm also assuming you control this type, but not the VectorN types.

The issue is that you're lacking any generic support for Vectors and you cannot use polymorphism on them in any way. As you've said, wrapping them in an interface or introducing custom classes as wrappers will kill the performance.

So the thing you want to do is a variation on double-dispatch - call a different method based on the type of its arguments.

The simplest thing that comes to mind is a static wrapper for the existing FOREIGN_API calls:

public static class VectorExtensions
{
    public static int Sum<TVector>(this IEnumerable<TVector> vectors)
    {
        var type = typeof(TVector);
        if (type == typeof(Vector1))
        {
            return FOREIGN_API.Sum((IEnumerable<Vector1>)vectors);
        }
        else if (type == typeof(Vector2))
        {
            return FOREIGN_API.Sum((IEnumerable<Vector2>)vectors);
        }
        else if (...) // etc.

        throw new ArgumentException($"Invalid type of vector {typeof(TVector).Name}.");
    }
}

Then, implementing an Average on a mesh is easy (I'm assuming an average is an average of all lists combined):

public class Mesh<T1, T2, T3, T4>
{
    private List<T1> _myVectors1;
    private List<T2> _myVectors2;
    private List<T3> _myVectors3;
    private List<T4> _myVectors4;

    public float Average()
    {
        var sum1 = _myVectors1.Sum();
        var sum2 = _myVectors2.Sum();
        var sum3 = _myVectors3.Sum();
        var sum4 = _myVectors4.Sum();

        return (float)(sum1 + sum2 + sum3 + sum4) / 
            (_myVectors1.Count + _myVectors2.Count + _myVectors3.Count + _myVectors4.Count);
    }
}

This form of typechecking should be fast, as C# heavily optimizes typeof calls.

There is a simpler way of writing this that involves dynamic:

public static class VectorExtensions
{
    public static int Sum<TVector>(this IEnumerable<TVector> vectors) =>
        FOREIGN_API.Sum((dynamic)vectors);
}

The dynamic infrastructure is also faster than many expect due to caching, so you might want to give this solution a try first and then think about something else only when the performance is diagnosed to be an issue. As you can see this takes a ridiculously small amount of code to try out.

=============================================================================

Now let's assume we're looking for the most performant way possible. I'm pretty convinced that there's no way of entirely avoiding runtime typechecking. In the above case note, that there are only a handful of typechecks per method invocation. Unless you're calling the Mesh<,,,> methods millions of times, that should be fine. But assuming that you might want to do that, there's a way to trick our way out of this.

The idea is to perform all the typechecks required the moment you instantiate a mesh. Let us define helper types that we will call VectorOperationsN for all possible N in VectorN types. It will implement an interface IVectorOperations<TVector> that will define basic vector operations you want to have. Let's go with Sum for one or many vectors for now, just as examples:

public interface IVectorOperations<TVector>
{
    public int Sum(TVector vector);

    public int Sum(IEnumerable<TVector> vectors);
}

public class VectorOperations1 : IVectorOperations<Vector1>
{
    public int Sum(Vector1 vector) => vector.x;

    public int Sum(IEnumerable<Vector1> vectors) => vectors.Sum(v => Sum(v));
}


public class VectorOperations2 : IVectorOperations<Vector2>
{
    public int Sum(Vector2 vector) => vector.x + vector.y;

    public int Sum(IEnumerable<Vector2> vectors) => vectors.Sum(v => Sum(v));
}

Now we need a way to get the appropriate implementation - this will involve the typecheck:

public static class VectorOperations
{
    public static IVectorOperations<TVector> GetFor<TVector>()
    {
        var type = typeof(TVector);

        if (type == typeof(Vector1))
        {
            return (IVectorOperations<TVector>)new VectorOperations1();
        }
        else if (...) // etc.

        throw new ArgumentException($"Invalid type of vector {typeof(TVector).Name}.");
    }
}

Now when we create a mesh we will get an appropriate implementation and then use it all throught our methods:

public class Mesh<T1, T2, T3, T4>
{
    private List<T1> _myVectors1;
    private List<T2> _myVectors2;
    private List<T3> _myVectors3;
    private List<T4> _myVectors4;
    private readonly IVectorOperations<T1> _operations1;
    private readonly IVectorOperations<T2> _operations2;
    private readonly IVectorOperations<T3> _operations3;
    private readonly IVectorOperations<T4> _operations4;

    public Mesh()
    {
        _operations1 = VectorOperations.GetFor<T1>();
        _operations2 = VectorOperations.GetFor<T2>();
        _operations3 = VectorOperations.GetFor<T3>();
        _operations4 = VectorOperations.GetFor<T4>();
    }

    public float Average()
    {
        var sum1 = _operations1.Sum(_myVectors1);
        var sum2 = _operations2.Sum(_myVectors2);
        var sum3 = _operations3.Sum(_myVectors3);
        var sum4 = _operations4.Sum(_myVectors4);

        return (float)(sum1 + sum2 + sum3 + sum4) / 
            (_myVectors1.Count + _myVectors2.Count + _myVectors3.Count + _myVectors4.Count);
    }
}

This works and does a typecheck only when instantiating the mesh. Success! But we can optimize this further using two tricks.

One, we don't need new instances of IVectorOperations<TVector> implementations. We can make them singletons and never instantiate more than one object for one type of vector. This is perfectly safe as the implementations are always stateless anyway.

public static class VectorOperations
{
    private static VectorOperations1 Implementation1 = new VectorOperations1();
    private static VectorOperations2 Implementation2 = new VectorOperations2();
    ... // etc.

    public static IVectorOperations<TVector> GetFor<TVector>()
    {
        var type = typeof(TVector);

        if (type == typeof(Vector1))
        {
            return (IVectorOperations<TVector>)Implementation1;
        }
        else if (...) // etc.

        throw new ArgumentException($"Invalid type of vector {typeof(TVector).Name}.");
    }
}

Two, we don't really need to typecheck every time we instantiate a new mesh. It's easy to see that the implementations stay the same for every object of a mesh type with equal type arguments. They are static in terms of a single closed generic type. Therefore, we really can make them static:

public class Mesh<T1, T2, T3, T4>
{
    private List<T1> _myVectors1;
    private List<T2> _myVectors2;
    private List<T3> _myVectors3;
    private List<T4> _myVectors4;
    private static readonly IVectorOperations<T1> Operations1 =
        VectorOperations.GetFor<T1>();
    private static readonly IVectorOperations<T2> Operations2 =
        VectorOperations.GetFor<T2>();
    private static readonly IVectorOperations<T3> Operations3 =
        VectorOperations.GetFor<T3>();
    private static readonly IVectorOperations<T4> Operations4 =
        VectorOperations.GetFor<T4>();

    public float Average()
    {
        var sum1 = Operations1.Sum(_myVectors1);
        var sum2 = Operations2.Sum(_myVectors2);
        var sum3 = Operations3.Sum(_myVectors3);
        var sum4 = Operations4.Sum(_myVectors4);

        return (float)(sum1 + sum2 + sum3 + sum4) / 
            (_myVectors1.Count + _myVectors2.Count + _myVectors3.Count + _myVectors4.Count);
    }
}

This way, if there are N different vector types, we only ever instantiate N objects implementing IVectorOperations<> and perform exactly as many additional type checks as there are different mesh types, so at most 4^N. Individual mesh objects don't take any additional memory, but there are again at most 4^N * 4 references to vector operation implementations.

This still forces you to implement all the vector operations four times for different types. But note that now you've unlocked all options - you have a generic interface that depends on the TVector type that you control. Any tricks inside your VectorOperations implementations are allowed. You can be flexible there while being decoupled from the Mesh by the IVectorOperations<TVector> interface.

Wow this answer is long. Thanks for coming to my TED talk!

V0ldek
  • 9,623
  • 1
  • 26
  • 57
  • Working my way through (thanks!) ... I didn't think your first code example was possible. Indeed, when I try it, I get: "error CS0030: Cannot convert type 'System.Collections.Generic.List' to 'System.Collections.Generic.List'" ... is there something I'm missing there? (I even tried a direct copy/paste into a new file, after my attempt to reproduce the same design failed with that same compiler error). NB: the API I'm trying it with first requires a List, won't accept IEnumerable. Yay for API authors ;) – Adam Nov 18 '19 at 00:24
  • 1
    @Adam It works with an `IEnumerable`. For a concrete `List` I'd suggest either casting with `as List` or using the `dynamic` solution from later, which will work with whatever type you throw at it. – V0ldek Nov 18 '19 at 00:30
  • I especially like the VectorOperations approach. It seems this is probably the most C#-ish way to deal with the situation (although 'dynamic' comes a close second - it's not what it was invented for, but IMHO it's compatible with the spirit of dynamic: to allow easy interop with foreign data types). I'm sad that there's no way of dealing with this situation using generics itself - feels like a large missing feature (polymorphism of generic arguments?) – Adam Nov 18 '19 at 13:46
  • This solution works, althouh it has also pushed the problem out wider: How do other classes interoperate with the Mesh class? Sigh. Typically other classes support specific combinations of the generic args - e.g. "OtherClass" conceptually is "OtherClass" - which I believe is the fundamental issue I had in the first place :). But for other classes (i.e. not Mesh) in many cases I can do internal conversions with only a few helper methods, and for the others I'll just re-use the same solution used inside the Mesh class. Thanks! – Adam Nov 18 '19 at 13:51
  • Hmm. Harder than I thought. I just found out that some of the processing classes *cannot* be generic classes (limitations of the huge legacy codebase), and so they *cannot* accept any object, anywhere, that uses generics - so my Mesh class cannot use generics (unless I make it 100% dynamic, which would be a nightmare for coding + maintenance). Ugh. I don't love C# generics. So much dependency on "everyone else has to architect perfect code that already uses generics, or else you cannot use generics yourself; and by the way: Generics doesn't really support structs. Have fun!" :(. – Adam Nov 18 '19 at 14:40
  • 1
    What do you mean it cannot accept any type using generics? Maybe you can extract a nongeneric interface for the Mesh, something like the nongeneric `IEnumerable`. – V0ldek Nov 18 '19 at 16:20
  • Ah. Yes, of course. You're right I should be able to handle that easily by extracting one (or a couple) of interfaces. I think it'll end up quite clean this way. I got so wrapped up in the complexities of this that I've stopped being able to think clearly about simple things. – Adam Nov 18 '19 at 18:59
  • Update: "dynamic" apparently isn't useable when writing cross-platform code because Apple has banned it :( (it falls under their definition of runtime-code-generation). – Adam Nov 27 '19 at 18:32
0

(I don't think this works, but it's the direction I tried to go in at first - comments welcome, maybe it'll inspire someone else to provide a better answer:)

I thought I could do something like (possible in C++, if I remember correctly, and C# has no direct equivalent for the fully general case, but I figured that for simple cases like this there might be an equivalent):

public class Mesh<T1,T2>
{
 // This class is basically going to fail at runtime:
 //   it cannot/will not prevent you from instancing it
 //   as - say - a Mesh<string,int> - which simply cannot
 //   be sensibly implemented.
 //
 // So: many methods will throw Exceptions - but some can be implemented
 //   (and hence: shared amongst all the other variants of the class)

     public List<T1> internalList;
     public int CountElements<List<T1>>() { return internalList.Count; }
     public int DoSomethingToList1<T1>() { ... }
}

public class Mesh<Vector2,T2>
{
     // Now we're saying: HEY compiler! I'll manually override the
     //    generic instance of Mesh<T1,T2> in all cases where the
     //    T1 is a Vector2!

     public int DoSomethingToList1<Vector2>() { ... }
}

Or another attempt to find a syntactically valid way to do the same thing (c.f. @Gserg's comment to the main question) - but obviously this fails because C# compiler forbids arbitrary type-casting:

    private List<T1> data;
    public void Main()
    {
        if( typeof(T1) == typeof(Vector2) )
            Main( (List<Vector2>) data );
        else if( typeof(T1) == typeof(Vector3) )
            Main( (List<Vector3>) data );
    }

    public void Main( List<Vector2> dataVector2s )
    {
        ...
    }
    public void Main( List<Vector3> dataVector3s )
    {
        ...
    }
Adam
  • 32,900
  • 16
  • 126
  • 153
0

I'm not sure if this is what you want, but perhaps you could solve this with a little runtime compilation. For example, you could generate a delegate that sums the fields of a struct;

        public Func<T, int> Sum { get; private set; }
        public void Compile()
        {
            var parm = Expression.Parameter(typeof(T), "parm");
            Expression sum = null;

            foreach(var p in typeof(T).GetFields())
            {
                var member = Expression.MakeMemberAccess(parm, p);
                sum = sum == null ? (Expression)member : Expression.Add(sum, member);
            }
            Sum = Expression.Lambda<Func<T, int>>(sum, parm).Compile();
        }

Or perhaps just a method that turns the struct into some other kind of enumerable that's easier to work with.

Jeremy Lakeman
  • 9,515
  • 25
  • 29
  • Runtime compilation can solve any problem, eh? :). It feels like it's working outside / against C#, so I'm hoping for an approach that works within the language and I can re-use in future. (also, in my specific case today: sadly one of my target platforms is iOS, where runtime compilation is banned) – Adam Nov 18 '19 at 13:30
  • Then perhaps hand write ToVector / FromVector extension methods for each supported struct? Then your calculations can be based on a common type. Hopefully the compiler can still specialise & inline everything where appropriate for performance. – Jeremy Lakeman Nov 19 '19 at 00:53
-1

My advise would be to have Vector2 and Vector3 bring their own processing methods. Interfaces are the droid you are looking for:

  • Have them implement a interface with the functions you need
  • use that interface as the type for anything handed into your list
  • Call the interface functions

A fitting name for the show process would be "Sumable".

Those might be the builtin implementations of the Vector struct. Of course those two can not be inherited. But the MVVM way of doing things is: "If you can not inherit or modify it, wrap it into something you can inherit and modify."

A simple wrapper (it can be a struct or class) around one of those vectors that implements the interface is all you need.

Another option would be to use LINQ for the processing. If it is only a one-off process, it is often a lot more lightweight then going all the way into inhertiance, classes, interfaces and the like.

Christopher
  • 9,634
  • 2
  • 17
  • 31
  • 1
    They are structs primarily (I believe) for performance reasons. A wrapper object will simply kill the performance, at which point I might as well have done a mem-copy of all data into an easier type (classes, for a start!) and work with it there. – Adam Nov 17 '19 at 21:56
  • PS: I thought of using Extension Methods to do what you describe, but ... apparently C# doesn't fully support this (according to another answer: Extension Methods cannot operate on struct-references - https://stackoverflow.com/questions/4656222/extension-methods-on-a-struct ) ... again: if we pass in by value, there's going to be horrendous amounts of memory-copying going on :(. – Adam Nov 17 '19 at 21:58
  • @Adam Extension Methods work fully on collections of structs, wich are still just reference types. Also the answer says the oppostie: Extension Methods do work on structs fully. https://stackoverflow.com/a/4656247/3346583 – Christopher Nov 17 '19 at 22:01
  • @Adam: As for the "speed advatange" of structs, there are two articles worth reading: 1. the Performance Rant, starting with Point 2: https://ericlippert.com/2012/12/17/performance-rant/ | 2. The Stack is a implementation detail: https://blogs.msdn.microsoft.com/ericlippert/2009/04/27/the-stack-is-an-implementation-detail-part-one/ | Both are from Erich Lippert. He knows his stuff. – Christopher Nov 17 '19 at 22:03
  • 2
    The clue is in the struct names: anything using a Vector or Mesh class is almost guaranteed to be outside the "vast majority" that Eric is simplifying for. His "vast" majority isn't even the majority in some industries :). In this case: a "small ish" run is processing circa 0.1 billion items a second - quite standard numbers when working with meshes :). – Adam Nov 17 '19 at 22:08
  • @Adam If you count operations in the Billions per second, this goes into the area of realtime programming. And a JiT compiled, Garabge Collected runtime is not the right environment for realtime applications (for the same reasons writing meaningfull benchmarks for it is hard). You will propably run into way worse issues with that kind of performance requirement. – Christopher Nov 17 '19 at 22:16
  • I agree with you that - if I had the choice of architecting the whole ecosystem from ground up - I would have tried to do it a different way. But I believe it's immaterial to the question, where the constraint is "we're being supplied with data in structs, and everyone is already hardcoded to receive structs from us, so we have to work within/around that" – Adam Nov 17 '19 at 23:08
  • @Adam Question: Do you ever actually receive anything other the Vector2 and Vector3? If not, generics does not sound remotely like the droid you should be looking for. – Christopher Nov 18 '19 at 12:51
  • Vector1, Vector2, Vector3, and Vector4. If it were only two of those, I would have bitten the bullet (indeed, this is what a lot of people have done in the past - only implemented 3-5 of the 256 variations, and then when they need one of the missing 250-something implementations, take a few days to go back and rewrite the one they need, and debug it, etc. But that's messy and horrible and feels like the language is working against us instead of for us :)) – Adam Nov 18 '19 at 13:54
  • @Adam It is a conflict, but between you and strong Typisation. And I can not think of a **performant** way to get around it. You either have the overhead of a extra types, a function call or the need to write a lot of repetitive code in in *all conceivable solutions*. Down near the metal, it still comes down to jumps to another piece of code or a function call. – Christopher Nov 18 '19 at 14:44