How to remove reverse duplicates in a C# Tuple list

Question

Say I have a Tuple List like this:

    List<Tuple<string, string>> conflicts = new List<Tuple<string, string>>();
    conflicts.Add(new Tuple<string, string>("Maths", "English"));
    conflicts.Add(new Tuple<string, string>("Science", "French"));
    conflicts.Add(new Tuple<string, string>("French", "Science"));
    conflicts.Add(new Tuple<string, string>("English", "Maths"));

And I want to check the Tuple List for reverse duplicates and remove them, how would I go about doing this with a loop?

NOTE: by reverse duplicates I mean the recurrence of "English", "Maths" and "Maths", "English"

NOTE: My Tuple in my code is populated using SqlDataReader but the example I used above is pretty close to the way its laid out.

This seems like it would be really simple but it has had be stumped all night

Do you want to remove both duplicates, or leave one? – Wai Ha Lee Mar 10 '16 at 22:34 — Wai Ha Lee, Mar 10 '16 at 22:34

Eser · Answer 1 · 2016-03-10T23:21:08.450

With a custom IEqualityComparer

public class TupleComparer : IEqualityComparer<Tuple<string, string>>
{
    public bool Equals(Tuple<string, string> x, Tuple<string, string> y)
    {
        return  (x.Item1 == y.Item1 && x.Item2 == y.Item2) ||
                (x.Item1 == y.Item2 && x.Item2 == y.Item1);
    }

    public int GetHashCode(Tuple<string, string> obj)
    {
        return string.Concat(new string[] { obj.Item1, obj.Item2 }.OrderBy(x => x)).GetHashCode();
        //or
        //return (string.Compare(obj.Item1, obj.Item2) < 0 ? obj.Item1 + obj.Item2 : obj.Item2 + obj.Item1).GetHashCode(); 
    }
}

You can use a HashSet<Tuple<string, string>> instead of List<Tuple<string, string>>

var conflicts = new HashSet<Tuple<string, string>>(new TupleComparer());
conflicts.Add(new Tuple<string, string>("Maths", "English"));
conflicts.Add(new Tuple<string, string>("Science", "French"));
conflicts.Add(new Tuple<string, string>("French", "Science"));
conflicts.Add(new Tuple<string, string>("English", "Maths"));

nobody · Accepted Answer · 2016-03-10T23:17:07.823

List<Tuple<string, string>> conflicts = new List<Tuple<string, string>>();
List<Tuple<string, string>> noConflicts = new List<Tuple<string, string>>();

conflicts.Add(new Tuple<string, string>("Maths", "English"));
conflicts.Add(new Tuple<string, string>("Science", "French"));
conflicts.Add(new Tuple<string, string>("French", "Science"));
conflicts.Add(new Tuple<string, string>("English", "Maths"));

foreach(Tuple<string,string> t in conflicts)
{
      if(!noConflicts.Contains(t) && !noConflicts.Contains(new Tuple<string,string>(t.Item2,t.Item1)))
           noConflicts.Add(t);
}

foreach(Tuple<string, string> t in noConflicts)
       Console.WriteLine(t.Item1 + "," + t.Item2);

I am sure there are better ways,but it works

Thanks for this, other solutions may have been cleaner, however yours was the most straightforward for me to understand and use at my current level of ability so I marked it the accepted answer. Take care. — Dude365, Mar 11 '16 at 09:30

score 3 · Answer 3 · answered Mar 10 '16 at 22:51

A rather crude implementation:

var distinct =
    conflicts
        .GroupBy(
            x =>
                {
                    var ordered = new[] { x.Item1, x.Item2 }.OrderBy(i => i);
                    return
                        new
                        {
                            Item1 = ordered.First(),
                            Item2 = ordered.Last(),
                        };
                })
        .Distinct()
        .Select(g => g.First())
        .Dump();

It orders the items in the tuple so that Maths,English and Engilsh,Maths are the same, then puts them into a anonymous type (calling things Item1/2 again), then relies on the structural equality of anonymous types to perform a distinct, then I just pull out the first tuple from each group.

score 1 · Answer 4 · edited May 23 '17 at 11:53

The problem is that you're misusing Tuple<T,Y>. If { "Math", "Science" }and { "Science" , "Math" } are interchangeable, then they're not pairs. You're using it more as a string[2]. As an example, in a Dictionary, which is a Tuple<TKey,TValue> the are meaningfully separate things that do have a proper pair relationship and aren't just lists of data.

Try using something like List<List<string>>, which better represents your data, and allows you access to useful List<T> answer, like this one. Or indeed List<Conflict>, where Conflict contains a List, where order is not important to equality.

Frozenthia · Answer 5 · 2016-03-10T23:31:36.260

1

LINQ one liner. Gotta love it.

var noConflicts = conflicts.Select(c => new HashSet<string>() { c.Item1, c.Item2})
    .Distinct(HashSet<string>.CreateSetComparer())
    .Select(h => new Tuple<string, string>(h.First(), h.Last()));

This works by sending everything to a HashSet<T> which has the CreateSetComparer() method which allows it to do a Distinct() regardless of order.

edited Mar 10 '16 at 23:31

answered Mar 10 '16 at 23:24

Frozenthia

759
3
9

Chris Hayes · Answer 6 · 2016-03-10T23:32:03.010

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    public static void Main()
    {

        var conflicts = new List<Tuple<string, string>>();
        conflicts.Add(new Tuple<string, string>("Maths", "English"));
        conflicts.Add(new Tuple<string, string>("Science", "French"));
        conflicts.Add(new Tuple<string, string>("French", "Science"));
        conflicts.Add(new Tuple<string, string>("English", "Maths"));

        RemoveDupes(conflicts);
        foreach(var i in conflicts) Console.WriteLine(i.Item1 + " " + i.Item2);

    }

    public static void RemoveDupes(List<Tuple<string, string>> collection){
        var duplicates = collection
            // indescriminate which value comes first
            .Select((x, i) => new{ Item= new Tuple<string,string>(x.Item2.IsGreaterThan(x.Item1) ? x.Item2 : x.Item1, 
                                                                  x.Item2.IsGreaterThan(x.Item1) ? x.Item1 : x.Item2), Index = i})
            // group on the now indescrimitate values
            .GroupBy(x => x.Item)
            // find duplicates
            .Where(x => x.Count() > 1)
            .Select(x => new {Items = x, Count=x.Count()})
            // select all indexes but first
            .SelectMany( x =>
                x.Items.Select( b => b)
                       .Zip(Enumerable.Range( 1, x.Count ),
                            ( j, i ) => new { Item = j, RowNumber = i }
                )
            ).Where(x => x.RowNumber != 1);
        foreach(var item in duplicates){
            collection.RemoveAt(item.Item.Index);
        }
    }


}

public static class Ext{
    public static bool IsGreaterThan(this string val, string compare){
        return val.CompareTo(compare) == 1;
    }
}

Antonín Lejsek · Answer 7 · 2016-03-12T02:01:30.687

The best way to avoid AB/BA ambiguity of representation is having data model, that does not allow them. By imposing constraints You can achieve that, in databases this is widely used approach. If we say that tuple is ordered, no ambiguity can occur

public class Ordered2StrTuple : Tuple<string, string> 
{
    public Ordered2StrTuple(string a, string b)
        : this(a, b, String.CompareOrdinal(a,b))
    { }

    private Ordered2StrTuple(string a, string b, int cmp)
        : base(cmp > 0 ? b : a, cmp > 0 ? a : b)
    { }
}

Now the task is really easy:

var noConflicts = conflicts
    .Select(s => new Ordered2StrTuple(s.Item1, s.Item2))
    .Distinct();

The comparison needs to be ordinal to be consistent with Equal, so I removed the generic version I had here. If You only want to do one time deduplication, You can to it like this:

var noConflicts = conflicts.Select(t =>
    String.CompareOrdinal(t.Item1, t.Item2) > 0 ? new Tuple<string, string>(t.Item2, t.Item1) : t
    ).Distinct();

How to remove reverse duplicates in a C# Tuple list

7 Answers7