1

I have a DTO called FileRecordDto, with a property that is a dictionary of strings. This dictionary represents one transaction record. Values are all strings and the Keys are column names:

public class FileRecordDto
{
    public IDictionary<string, string> Fields { get; internal set; }
        = new Dictionary<string, string>();
}

I also have an enum wrapper class to represent the column names

public class FieldName
{
    public enum CRFS
    {
        Amount,
        ActionDate,
        ContractReference,
        CycleDate,
    }
}

Given an input list of FileRecordDtos, I need to group by unique records based on four fields. I've found a standard approach on StackOverflow to handle this, but it seems to break when checking against an array of strings:

var filteredRecords = originalRecords.GroupBy(x => new string[]
{
    x.Fields[nameof(FieldName.CRFS.ContractReference)],
    x.Fields[nameof(FieldName.CRFS.ActionDate)],
    x.Fields[nameof(FieldName.CRFS.Amount)],
    x.Fields[nameof(FieldName.CRFS.CycleDate)]
});

return filteredRecords.Select(y => y.First()).ToList();

IntelliSense inspection during run-time shows that the string arrays for two different records may look the same, but they are treated as different values.

What am I doing wrong?

Riegardt Steyn
  • 5,431
  • 2
  • 34
  • 49

3 Answers3

2

Use an anonymous type:

var filteredRecords = originalRecords.GroupBy(x => new 
{
    ContractReference = x.Fields[nameof(FieldName.CRFS.ContractReference)],
    ActionDate        = x.Fields[nameof(FieldName.CRFS.ActionDate)],
    Amount            = x.Fields[nameof(FieldName.CRFS.Amount)],
    CycleDate         = x.Fields[nameof(FieldName.CRFS.CycleDate)]
});
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • This is the first thing I tried. Alas, C# 6.0 does not like it: `Invalid anonymous type member declarator. Anonymous type members must be declared with a member assignment, simple name or member access.` – Riegardt Steyn Aug 30 '17 at 14:06
  • 2
    Put `a =` in front of the first `x.Fields`, then `b =` etc @Heliac . It is just complaining that you haven't specified a property name. – mjwills Aug 30 '17 at 14:10
0

You need to define custom IEqualityComparer<TKey> to compare the arrays by value instead of by reference. An implementation of this would be :

public class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
    public bool Equals(T[] x, T[] y)
    {
        return x.SequenceEqual(y);
    }


    public int GetHashCode(T[] array)
    {
        int hc = array.Length;
        for (int i = 0; i < array.Length; ++i)
        {
            hc = unchecked(hc * 314159 + array[i].GetHashCode());
        }
        return hc;
    }
}

Usage:

var filteredRecords = originalRecords.GroupBy(new string[]
{
    x.Fields[nameof(FieldName.CRFS.ContractReference)],
    x.Fields[nameof(FieldName.CRFS.ActionDate)],
    x.Fields[nameof(FieldName.CRFS.Amount)],
    x.Fields[nameof(FieldName.CRFS.CycleDate)]
}, new ArrayEqualityComparer<string>());

If the number of fields is fixed you can use an anonymous object instead:

var filteredRecords = originalRecords.GroupBy(x => new 
{
    ContractReference = x.Fields[nameof(FieldName.CRFS.ContractReference)],
    ActionDate = x.Fields[nameof(FieldName.CRFS.ActionDate)],
    Amount = x.Fields[nameof(FieldName.CRFS.Amount)],
    CycleDate = x.Fields[nameof(FieldName.CRFS.CycleDate)]
}); 
Titian Cernicova-Dragomir
  • 230,986
  • 31
  • 415
  • 357
-1

As mentioned by @mjwills, I was just missing property names when I tried new without the string[] part. So this works:

var filteredRecords = originalRecords.GroupBy(x => new
{
    a = x.Fields[nameof(FieldName.CRFS.ContractReference)],
    b = x.Fields[nameof(FieldName.CRFS.ActionDate)],
    c = x.Fields[nameof(FieldName.CRFS.Amount)],
    d = x.Fields[nameof(FieldName.CRFS.CycleDate)]
});

return filteredRecords.Select(y => y.First()).ToList();

For the sake of completeness, the following fixed my problem with two seemingly similar string arrays being seen as unique; I think it was comparing string pointers instead of values. Adding up the GetHashCode values worked like a charm:

var filteredRecords = originalRecords.GroupBy(x =>                  
    x.Fields[nameof(FieldName.CRFS.ContractReference)].GetHashCode()
    + x.Fields[nameof(FieldName.CRFS.ActionDate)].GetHashCode()
    + x.Fields[nameof(FieldName.CRFS.Amount)].GetHashCode()
    + x.Fields[nameof(FieldName.CRFS.CycleDate)].GetHashCode());

return filteredRecords.Select(y => y.First()).ToList();
Riegardt Steyn
  • 5,431
  • 2
  • 34
  • 49
  • `Adding up the GetHashCode values worked like a charm` - what do you do if two completely different sets of values happen to have hashcodes that, when added, are the same? – mjwills Aug 31 '17 at 04:30
  • The possibility of that is the same as that of any other hash function clashing. What do you do if that is a concern? Write a better hash function. – Riegardt Steyn Aug 31 '17 at 05:33
  • `Write a better hash function.` The point is your code above **isn't** a hash function. **It is a function to determine the key for a `GroupBy`** (which is a completely different thing). If two records have completely different values - and those values happen to have hash codes that when added result in the same value - then they will be grouped together. This is incorrect. Tim's and Titian's solution do not exhibit this behaviour (i.e. they work). `Guys, if you are going to down-vote people, please provide a proper reason for doing so.` I added my comment at the same time as my downvote. – mjwills Aug 31 '17 at 05:56