Removing duplicate pairs of a List

Question

Issue

I'm displaying a geometry using simple lines in a 3D WPF library. An example of it could be seen in the next picture:

enter image description here

In it you can see a set of triangles and quads. The way I'm plotting this is that I provide a List<Point3D> in which I place pairs of points representing each segment.

The problem is that there are a lot of duplicate edges and I would like to avoid this as this type of representation seems to be very resource demanding.

The points list is generated iterating over each Element that contains N vertices. It is not aware of if a particular edge is shared or not.

<p0, p1, p1, p2, p2, p333, p333, p89, p89, p2, p2, p1 ...>

The idea would be to remove the repeated pairs (note that the order could be not the same). In the example above the removed pair should be the last one (p2, p1), as it represents the same edge as the second pair of points (p1, p2). There could be one, two, or more duplicate pairs of points.

I need to do this operation as fast as possible, performance is top prio here.

Ideas

While adding points in the list I could store temporarily two of them and check if the list already contains them, but this implies looking into a list each time I add a point and doesn't seems a good idea to me (the list will contain several thousand of points 5000-50000).

The elements of which I generate the points list have several nodes with an unique ID so I think it could be used somehow by creating a Dictionary of ordered Tuple<Point3D, Point3D> and then removing the duplicate items.

I haven't tried the last idea as I'm not sure how to implement it yet, and I would like to hear if there is some other thing that can be done.

Your example just gives a list of points which have no connection. It would be easy if represented as pairs / tuples. Just write a comparer that tells you whether the same coordinates are used, which means they are the same and can be removed. — Mare Infinitus, Jul 20 '14 at 08:45
This list interpreted in a way that each pair is converted into a segment. I think comparing the node ID it would be much faster than looking to the coordinates. — Sturm, Jul 20 '14 at 08:49

score 3 · Answer 1 · edited Jul 20 '14 at 10:34

You can use HashSet to store all the edges. It is fast to check, is the edge is already in set. But you should override GetHashCode and Equals. I made a simple example.

class MyLine
{
    public MyPoint P1 { get; private set; }
    public MyPoint P2 { get; private set; }
    public MyLine(MyPoint p1, MyPoint p2)
    {
        P1 = p1;
        P2 = p2;
    }
    protected bool Equals(MyLine other)
    {
        return (Equals(P1, other.P1) && Equals(P2, other.P2)) || Equals(P1, other.P2) && Equals(P2, other.P1);
    }
    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((MyLine)obj);
    }
    public override int GetHashCode()
    {
        unchecked
        {
            return P1.GetHashCode() + P2.GetHashCode();
        }
    }
}
class MyPoint
{
    public string Id { get; private set; }
    public MyPoint(string id)
    {
        Id = id;
    }
    protected bool Equals(MyPoint other)
    {
        return string.Equals(Id, other.Id);
    }
    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((MyPoint)obj);
    }
    public override int GetHashCode()
    {
        return (Id != null ? Id.GetHashCode() : 0);
    }
}

then you should be able to add each line like this:

public static void Main(string[] args)
{
    HashSet<MyLine> lines = new HashSet<MyLine>();
    var line = new MyLine(new MyPoint("a"), new MyPoint("b"));
    lines.Add(line);
    line = new MyLine(new MyPoint("b"), new MyPoint("a"));
    lines.Add(line);
}

Also with GetHashCode and Equals you can store all the lines in List and then use Distinct method.

public static void Main(string[] args)
{
    List<MyLine> lines = new List<MyLine>();
    var line = new MyLine(new MyPoint("a"), new MyPoint("b"));
    lines.Add(line);
    line = new MyLine(new MyPoint("b"), new MyPoint("a"));
    lines.Add(line);
    lines = lines.Distinct().ToList();
}

Take another look at your `GetHashCode` method. As it stands it's just adding a constant to the sum of the hash codes of the fields. — Matthew Strawbridge, Jul 20 '14 at 10:09
@MatthewStrawbridge yes,i just upgraded this http://stackoverflow.com/a/263416/1237491 to be symetrical for points. Not sure if it is a good hash code. Any sugestions how to improve it? — Dmitrii Dovgopolyi, Jul 20 '14 at 10:19
Okay, I've updated it. The point is that the whole hash needs to be multiplied by 23 at each step, and you were missing that after the first field. — Matthew Strawbridge, Jul 20 '14 at 10:26
@MatthewStrawbridge no, i made it on purpose. Look at equals method. The hash must be the same for (p1,p2) and (p2, p1). — Dmitrii Dovgopolyi, Jul 20 '14 at 10:30
Right, I see what you mean. Then you might as well just use the sum without any primes: they don't add anything (i.e. it's as if you've only got one field, not two). — Matthew Strawbridge, Jul 20 '14 at 10:33

zmbq · Answer 2 · 2014-07-20T09:38:42.977

1

Use a HashSet<Tuple<Point3D, Point3D>> . Whenever you get a new edge - p1,p2, check for (p1,p2)'s existence in the set. Also check for (p2,p1)'s existence. If neither exists, add (p1,p2) and (p2,p1) to the set and use the edge.

You can further speed this up by crafting your own hash and equality functions that will see (p1,p2) as equal to (p2,p1). Don't do that unless you need to, Set operations are very quick, I doubt the improvement will be considerable.

edited Jul 20 '14 at 09:38

answered Jul 20 '14 at 08:55

zmbq

38,013
14
101
171

2

1. C# does not have `Set`, only `HashSet`. 2. You should either check for both `(p1,p2) (p2,p1)` or don't check anything and add both `(p1,p2) (p2,p1)`. HashSet will automaticly skip the adding if the tuple already exist. – Dmitrii Dovgopolyi Jul 20 '14 at 09:28

2 Answers2