8

I have a List of objects that some of them have the same Ids, so I would like to remove those elements that are duplicated.

I tried with something like this:

List<post> posts = postsFromDatabase.Distinct().ToList();

But it doesn't work!

So I wrote this method in order to avoid the duplicates:

public List<Post> PostWithOutDuplicates(List<Post> posts)
    {
        List<Post> postWithOutInclude = new List<Post>();
        var noDupes = posts.Select(x => x.Id).Distinct();
        if (noDupes.Count() < posts.Count)
        {
            foreach (int idPost in noDupes)
            {
                postWithOutInclude.Add(posts.Where(x => x.Id == idPost).First());
            }
            return postWithOutInclude;
        }
        else
        {
            return posts;
        }
    }

Any ideas of how to improve the performance??

Thanx in advance.

Piotr Justyna
  • 4,888
  • 3
  • 25
  • 40
Javier Hertfelder
  • 2,432
  • 4
  • 22
  • 36

3 Answers3

32

This is nice and easy:

List<Post> posts = posts
.GroupBy(x => x.Id)
.Select(x => x.FirstOrDefault())

But if you want to write it the proper way, I'd advise you to write it like this:

public class PostComparer : IEqualityComparer<Post>
{
    #region IEqualityComparer<Post> Members

    public bool Equals(Post x, Post y)
    {
        return x.Id.Equals(y.Id);
    }

    public int GetHashCode(Post obj)
    {
        return obj.Id.GetHashCode();
    }

    #endregion
}

As it will give you more freedom when it comes to additional comparisons. having written this class you can use it like this:

List<Post> posts = postsFromDatabase.Distinct(new PostComparer()).ToList();
Piotr Justyna
  • 4,888
  • 3
  • 25
  • 40
  • 2
    I think in `GetHashCode` you must use `obj.Id.GetHashCode()` because the hashcode must be the same for two objects which are equal according to the `Equals` method (at least MSDN says this). – Slauma Dec 16 '11 at 16:55
  • Well spotted! There should be Id.GetHashCode(), you're right. If anyone's interested: http://msdn.microsoft.com/en-us/library/ms132151.aspx – Piotr Justyna Dec 16 '11 at 17:15
  • 3
    This will handle this when the data is in memory. Not good. Use the GroupBy-approch: http://stackoverflow.com/questions/8560884/how-to-implement-iequalitycomparer-to-return-distinct-values – Markus Knappen Johansson Feb 20 '13 at 15:51
  • Thanks for your comment and the link, Markus. It's a pity the question is so old, because we could clarify with the OP if the objects (postsFromDatabase) are already in the memory. At the time I wrote this answer *I think* everybody assumed they are, hence I advised using the IEqualityComparer since (judging from my experience) it proves to be less expensive. – Piotr Justyna Feb 20 '13 at 16:45
5

I think that write your own custom comparer is a good approach.

Here is an article in msdn that explains the topic very well: http://support.microsoft.com/kb/320727

The reason that the Distinct are not working its that Distinct() has no idea about how to detemine if there are equals, so it's using the reference to determine it it's the same "object". It's working like it's suposed to work. All the classes in the query are not the same object.

By writing your own comparer (it's easy) you can tell to Distinct() how to make the comparation to determine if they are equals.

Edit: If not using Distinct isn't a problem and the situation isn't frecuent, the first answer of Piotr Justyna it's simple and effective.

jm.
  • 23,422
  • 22
  • 79
  • 93
Jonathan
  • 11,809
  • 5
  • 57
  • 91
0

instead of .First(), try .FirstOrDefault()

Brian
  • 2,229
  • 17
  • 24