0

I can retrieve duplicate by using this query:

var duplicates = grpDupes
    .GroupBy(i => new { i.Email })
    .Where(g => g.Count() > 1)
    .SelectMany(g => g);

But i am interested to find duplicates by using either Email or Address or xyz. If i modify above query

GroupBy(i => new { i.Email, i.Address }) 

then it becomes AND condition, any help?

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939

3 Answers3

3

You have to use the overloaded method which accepts an EqualityComparer.

    /// <summary>
/// Factory class which creates an EqualityComparer based on lambda expressions.
/// </summary>
/// <typeparam name="T">The type of which a new equality comparer is to be created.</typeparam>
public static class EqualityComparerFactory<T>
{
    private class MyComparer : IEqualityComparer<T>
    {
        private readonly Func<T, int> _getHashCodeFunc;
        private readonly Func<T, T, bool> _equalsFunc;

        public MyComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
        {
            _getHashCodeFunc = getHashCodeFunc ?? (a=>0);
            _equalsFunc = equalsFunc;
        }

        public bool Equals(T x, T y)
        {
            return _equalsFunc(x, y);
        }

        public int GetHashCode(T obj)
        {
            return _getHashCodeFunc(obj);
        }
    }

    /// <summary>
    /// Creates an <see cref="IEqualityComparer{T}" /> based on an equality function and optionally on a hash function.
    /// </summary>
    /// <param name="equalsFunc">The equality function.</param>
    /// <param name="getHashCodeFunc">The hash function.</param>
    /// <returns>
    /// A typed Equality Comparer.
    /// </returns>
    public static IEqualityComparer<T> CreateComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
    {
        ArgumentValidator.NotNull(() => equalsFunc);

        return new MyComparer(equalsFunc, getHashCodeFunc);
    }
}

Sample Usage:

        var comparer = EqualityComparerFactory<YourClassHere>.CreateComparer((a, b) => a.Address == b.Address || a.Email == b.Email);

        data.GroupBy(a => a, comparer);
AcidJunkie
  • 1,878
  • 18
  • 21
2

You could use EXISTS in SQL which is Any in LINQ:

var duplicates = grpDupes
    .Where(i => (i.Email.Trim() != "" || i.Address.Trim() != "")  && grpDupes
        .Any(i2 => i.ID != i2.ID && 
            ((i.Email.Trim()   != "" && i.Email   == i2.Email) || 
             (i.Address.Trim() != "" && i.Address == i2.Address))));

Note that i've used ID as the primary key column. If you don't have one you need to use the column(s) that you want to use as identifier.

If you use as database driven LINQ provider like LINQ-To-SQL or LINQ-To-Entities this is efficient.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • This seems working , anyone tell me why should i not use this? – user3636316 May 14 '14 at 11:38
  • can we improve this by ignoring blank emails or address in i.Email == i2.Email || i.Address == i2.Address). If emails in both records r empty then it also consider it as duplicate – user3636316 May 14 '14 at 12:22
  • @user3636316: i have edited my answer to provide a way that should work with all LINQ providers (as opposed to `String.IsNullOrWhiteSpace` for example). – Tim Schmelter May 14 '14 at 12:35
0

I'd keep this very simple by using .ToLookup().

How about this?

var emailLookup = grpDupes.ToLookup(x => x.Email);
var addressLookup = grpDupes.ToLookup(x => x.Address);

var duplicates = grpDupes
    .Where(x =>
        emailLookup[x.Email].Count() > 1 || addressLookup[x.Address].Count() > 1);
Enigmativity
  • 113,464
  • 11
  • 89
  • 172