-2

I have a Custom class shown below

internal class RecurringClusterModel
    {        
        public int? From { get; set; }       
        public int? To { get; set; }
        public string REC_Cluster_1 { get; set; }
        public string REC_Cluster_2 { get; set; }
        public string REC_Cluster_3 { get; set; }
        public string REC_Cluster_4 { get; set; }    
        public string REC_Cluster_5 { get; set; }
        public string REC_Cluster_6 { get; set; }
        public string REC_Cluster_7 { get; set; }
        public string REC_Cluster_8 { get; set; }
        public string REC_Cluster_9 { get; set; }
        public string REC_Cluster_10 { get; set; }

I have a List of this class

  List<RecurringClusterModel> recurringRecords = new List<RecurringClusterModel>(); 

The data can be in the below format
recurringRecords[0].REC_Cluster_1 = "USA";
recurringRecords[0].REC_Cluster_2 = "UK";
recurringRecords[0].REC_Cluster_3 = "India";
recurringRecords[0].REC_Cluster_4 = "France";
recurringRecords[0].REC_Cluster_5 = "China";


recurringRecords[1].REC_Cluster_1 = "France";
recurringRecords[1].REC_Cluster_2 = "Germany";
recurringRecords[1].REC_Cluster_3 = "Canada";
recurringRecords[1].REC_Cluster_4 = "Russia";
recurringRecords[1].REC_Cluster_5 = "India";

....

I want to find the duplicate records between all the Cluster properties..This is just a subset I have 50 properties till REC_Cluster_50. I want to find out which countries are getting duplicated between the 50 cluster properties of the list.

So in this case India and France are getting duplicated. I can group by individual property and then find out the duplicate by getting the count but then I d have to do it for all the 50 Rec_Clusters property. Not sure if there is a better way of doing it.

Thanks

SP1
  • 1,182
  • 3
  • 22
  • 47
  • 1
    Why do you have numbered properties to begin with? And in your second code block, shouldn't the second group of indexes be `[1]` instead of `[0]`? Do you want only to find records where the same cluster index holds the same value, or if `rec[0].REC_Cluster_1` equals `rec[42].REC_Cluster_12` then is it also considered to be duplicated? – CodeCaster Apr 17 '19 at 11:22
  • What have you tried yourself so far? – Mokuyobi Apr 17 '19 at 11:23
  • I am mapping the data from an Excel file for processing and the Excel file has 50 columns so that's why it is this way. – SP1 Apr 17 '19 at 11:23
  • rec[0].REC_Cluster_1 equals rec[42].REC_Cluster_12 then is it also considered to be duplicated - Yes it is also considered duplicate – SP1 Apr 17 '19 at 11:24
  • It's better to save those in an array or list.. Which will makes it much easier to find the duplicates.. Otherwise, oh well.. lots of `if`. – Paul Karam Apr 17 '19 at 11:25
  • How bout adding a property `IList AllClusters` to your model and in the getter you create a list with all values. Then u can just compare two lists. – Lennart Stoop Apr 17 '19 at 11:26
  • That is what I am doing right now..Looping through the whole list and saving the data in an array and then finding the duplicates which is working fine..I was just interested in seeing if somehow there is a better approach with linq. – SP1 Apr 17 '19 at 11:26
  • There is Something weird in the design here. REC_Cluster_1, REC_Cluster_2.. those incremental property name? It's not an Array? Why? – xdtTransform Apr 17 '19 at 11:35
  • 1
    A side from the weird things the"only path will be to use reflection to look for every property that strat with then group these on their value? Btw how to you handle deletion in this? If propertis 1, 2, and 3 are exact duplicate, Does property 4 has to move to property2 or 2 and 3 must remain empty? – xdtTransform Apr 17 '19 at 11:37
  • So the properties with Incrementing pattern REC_Cluster_1, REC_Cluster_2 is because I am importing data from an Excel file which has 50 columns..so in order to map it to my class I have created 50 properties with the same column name as Excel file and then I can do var setupClusterPrices = package.Workbook.Worksheets["Set-up Clusters"]; setupRecords = setupClusterPrices.ToList(out priceRowIDsWithBlankSID); so the system maps the Excel columns to my incrementing properties. – SP1 Apr 17 '19 at 11:39
  • @xdtTransform I was heading in this direction, which then gives you a dictionary of property name to value, but, if the class structure is owned by the OP, then best to use a collection of clusters – reckface Apr 17 '19 at 11:41
  • As a guyz who work primarly with weird data and shit. When Customer send and EXcell CSv you have one class for their file and one real DTO class to work on. Going from property 1,2,3 to a hashset should be the job of the "deserializer". – xdtTransform Apr 17 '19 at 11:43
  • @reckface, going for the right type is always a good solution. Perhaps we can go a step and make it a HashSet so we will not care about duplicate – xdtTransform Apr 17 '19 at 11:50
  • I know my comment will look like "Do B" when you ask for "How to do A". But going from A to B here should be a simple Projection from one type to an other. And if you really need to go the reflection way you can use [this](https://stackoverflow.com/q/737151/9260725) – xdtTransform Apr 17 '19 at 11:51

2 Answers2

2

Since you want to capture the From and To, I suggest you structure your class like this:

internal class RecurringClusterModel
{        
    public int? From { get; set; }       
    public int? To { get; set; }
    public IEnumerable<string> REC_Clusters { get; set; }
}

Then you can search for duplicates:

var dupes = recs
.Select(r => new
{
    r.From,
    r.To,
    DuplicateClusters = r.REC_Clusters.GroupBy(c => c)
          .Where(g => g.Count() > 1) // duplicates
          .SelectMany(g => g)  // flatten it back
          .ToArray() // indexed
})
.Where(r => r.DuplicateClusters.Any()) //only interested in clusters with duplicates
.ToArray();

EDIT

If you want all duplicates, then it will be:

var allDupes = recs.SelectMany(r => r.REC_Clusters)
.Select(r => r.GroupBy(c => c)
    .Where(g => g.Count() > 1)
    .SelectMany(g => g))
.Where(r => r.Any()).ToArray();

But now you lose track of the From/To

reckface
  • 5,678
  • 4
  • 36
  • 62
1

I would add an enumerable to your class that iterates over all properties of that class:

internal class RecurringClusterModel
{
    public string REC_Cluster_1 { get; set; }
    public string REC_Cluster_2 { get; set; }
    public string REC_Cluster_3 { get; set; }

    public IEnumerable<string> Clusters => GetAllClusters();

    private IEnumerable<string> GetAllClusters()
    {
        if (!string.IsNullOrEmpty(REC_Cluster_1))
            yield return REC_Cluster_1;

        if (!string.IsNullOrEmpty(REC_Cluster_2))
            yield return REC_Cluster_2;

        if (!string.IsNullOrEmpty(REC_Cluster_3))
            yield return REC_Cluster_3;
    }
}

With this you can flatten the list to the individual clusters and then group by. If you need the original object back again, you have to provide it while flattening. Here is an example:

var clusters = Enumerable
        .Range(1, 10)
        .Select(_ => new RecurringClusterModel
        {
            REC_Cluster_1 = _Locations[_Random.Next(_Locations.Count)],
            REC_Cluster_2 = _Locations[_Random.Next(_Locations.Count)],
            REC_Cluster_3 = _Locations[_Random.Next(_Locations.Count)],
        })
        .ToList();

var dictionary = clusters
    // Flatten the list and preserve original object
    .SelectMany(model => model.Clusters.Select(cluster => (cluster, model)))
    // Group by flattened value and put original object into each group
    .GroupBy(node => node.cluster, node => node.model)
    // Take only groups with more than one element (duplicates)
    .Where(group => group.Skip(1).Any())
    // Depending on further processing you could put the groups into a dictionary.
    .ToDictionary(group => group.Key, group => group.ToList());

foreach (var cluster in dictionary)
{
    Console.WriteLine(cluster.Key);

    foreach (var item in cluster.Value)
    {
        Console.WriteLine("   " + String.Join(", ", item.Clusters));
    }
}
Oliver
  • 43,366
  • 8
  • 94
  • 151