1

I got a lot of data from a database, which are results from a search function. Now I've a List<string[]> which has duplicated elements of type string[]. The string[] in the list are the search results.

I know that every new created array has a different instance so i can't use MyListOfArrays.Distinct().ToList().

Maybe it's a very basic question...

My question is, are there any functions built in to remove a duplicated string[] form the List<string[]>? Or do I have to write it by my selfe?

Thank you

Sinya
  • 154
  • 3
  • 15
  • 7
    You can use `.Distinct()` with a custom `EqualityComparer`: http://stackoverflow.com/questions/4607485/linq-distinct-use-delegate-for-equality-comparer – valverij Oct 01 '13 at 14:18
  • 6
    While not a direct answer to your question, it may be better to modify the query from the database so it does not give you duplicate results in its result set. – Scott Chamberlain Oct 01 '13 at 14:19
  • This is poorly not possible. I just call an api function. I wrote a loop because the api alows only one search field and I need to search in more the one field. This is why there are duplicates.. – Sinya Oct 01 '13 at 14:26
  • Do you need to remove whole duplicate list or do you need to remove duplicated strings only? – Ufuk Hacıoğulları Oct 01 '13 at 14:31
  • The whole duplicate. It's every time an array with the same values. – Sinya Oct 01 '13 at 14:34

3 Answers3

3

You can use distinct method with custom equalityComparer

    IEnumerable<string[]> distinct = inputStringArrayList.Distinct(new EqualityComparer());

EqualityComparer

class EqualityComparer : IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        if (x.Length != y.Length)
        {
            return false;
        }
        if (x.Where((t, i) => t != y[i]).Any())
        {
            return false;
        }
        return true;
    }

    public int GetHashCode(string[] obj)
    {
        return obj.GetHashCode(); 
    }
}

Alternative Equals Method

public bool Equals(string[] x, string[] y)
{
    return x.SequenceEqual(y);
}

Here I am assuming you are having exact same string arrays with same content at same index.

Correction from Matthew Watson

public int GetHashCode(string[] obj)
        {
            if (obj == null)
                return 0;

            int hash = 17;

            unchecked
            {
                foreach (string s in obj)
                    hash = hash*23 + ((s == null) ? 0 : s.GetHashCode());
            }

            return hash;
        }
Community
  • 1
  • 1
Hossain Muctadir
  • 3,546
  • 1
  • 19
  • 33
2

I have corrected the answer from @Muctadir Dinar.

(He deserves credit for the answer - I am just correcting it and providing a complete test program):

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    sealed class EqualityComparer: IEqualityComparer<string[]>
    {
        public bool Equals(string[] x, string[] y)
        {
            if (ReferenceEquals(x, y))
                return true;

            if (x == null || y == null)
                return false;

            return x.SequenceEqual(y);
        }

        public int GetHashCode(string[] obj)
        {
            if (obj == null)
                return 0;

            int hash = 17;

            unchecked
            {
                foreach (string s in obj)
                    hash = hash*23 + ((s == null) ? 0 : s.GetHashCode());
            }

            return hash;
        }
    }

    class Program
    {
        private void run()
        {
            var list = new List<string[]>
            {
                strings(1, 10), 
                strings(2, 10), 
                strings(3, 10), 
                strings(2, 10), 
                strings(4, 10)
            };

            dump(list);
            Console.WriteLine();

            var result = list.Distinct(new EqualityComparer());
            dump(result);
        }

        static void dump(IEnumerable<string[]> list)
        {
            foreach (var array in list)
                Console.WriteLine(string.Join(",", array));
        }

        static string[] strings(int start, int count)
        {
            return Enumerable.Range(start, count)
                .Select(element => element.ToString())
                .ToArray();
        }

        static void Main(string[] args)
        {
            new Program().run();
        }
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • Very nice. Thank you this works. Edit: Accepted Muctadir Dinar's answer. – Sinya Oct 01 '13 at 14:46
  • @Patrick Why would you accept a completely incorrect answer with several major errors that prevent it from working and would require non-trivial changes to fix? – Servy Oct 01 '13 at 14:55
  • @Servy My fault for saying "he deserves credit for the answer" I think. ;) I said that because didn't want to "steal" the idea of providing an IEqualityComparer implementation. – Matthew Watson Oct 01 '13 at 14:56
  • @Servy I just followed the sugestion of Metthew Watson, which he wrote in the second line of his post. – Sinya Oct 01 '13 at 14:57
  • @MatthewWatson As an optimization consider only considering the first handful of values in the array when computing the hash. I usually `Take` the first 5 or 10. If you go beyond that you spend more effort computing the hashes then you save by reducing collisions. – Servy Oct 01 '13 at 15:00
  • @servy That's a good point, but on the other hand if this is a general solution we have to beware the problem that early versions of the Java string hash code algorithm had (http://en.wikipedia.org/wiki/Java_hashCode%28%29) – Matthew Watson Oct 01 '13 at 15:06
  • 1
    You should surround the `foreach` in `getHashCode` with `unchecked`. http://stackoverflow.com/a/263416/284240 – Tim Schmelter Oct 01 '13 at 15:06
  • @TimSchmelter Ok, done - but of course this only matters if you're compiling with checked enabled. – Matthew Watson Oct 01 '13 at 15:08
  • Thanks guys. @MatthewWatson, you are right. My code does not work at all. I believe you won't mind it I replace my `GetHashCode` method with yours. – Hossain Muctadir Oct 01 '13 at 16:03
  • @MuctadirDinar That's totally fine. – Matthew Watson Oct 01 '13 at 16:28
  • Hi Matthew. Thank you for your post. I have list of string arrays. I want to remove duplicates by doing a check only on the first element of the string array. I tried to edit your code to achieve that but failed because IEqualityComparer confuses me. I currently managed to do it by using nested for loops but I know there has to be a more elegant way. Can you please help editing your code to achieve that? – Baz Guvenkaya Jun 17 '16 at 02:40
1

A simple and not very efficient approach would be to use string.Join on the string[]:

list = list
.GroupBy(strArr => string.Join("|", strArr))
.Select(g => g.First())
.ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 2
    This can have problems if the strings have `|` values in them. According to this `{"a|b"}` is equal to `{"a", "b"}`. – Servy Oct 01 '13 at 14:53
  • @Servy: Yes, that's what i meant with `simple`. Might be sufficient or not. So if the input is not arbitrary you could chose a separator that cannot occur. – Tim Schmelter Oct 01 '13 at 15:00
  • 1
    My point is that it's not just slow. If it was always correct but possibly slow you need only be concerned with large amounts of data. When it's not always correct it's a much bigger concern as to whether or not it can be used in any given case. – Servy Oct 01 '13 at 15:02