0

I've a lsit of type List<KeyValuePair<byte[], string>> fileHashList = new List<KeyValuePair<byte[], string>>();

foreach (string entry in results)
{
    FileInfo fileInfo = new FileInfo(Path.Combine("DirectoryPath"), entry));
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(fileInfo.FullName))
        {
            var hash = md5.ComputeHash(stream);
            fileHashList.Add(new KeyValuePair<byte[], string>(hash, fileInfo.FullName)); 
        }
    }
}

I need to find all the duplicate keys in this list.

I tried this but doesn't work in my case, I get "Enumeration yielded no results" even though I've same keys!

Let me know if any additional data is needed Thanks

m_beta
  • 132
  • 15
  • it doesn't make sens .. as byte[] are compared by ref – Selvin Sep 24 '20 at 13:09
  • 1
    Because `GroupBy` uses the default comparer, and for a byte array that's useless as it compares the object hash, which is different for arrays of equal content. See [duplicate](https://stackoverflow.com/questions/15841178/group-by-array-contents). Why do you have a byte array as key anyway? – CodeCaster Sep 24 '20 at 13:09
  • Please let me know how should I proceed, I mean in which direction – m_beta Sep 24 '20 at 13:10
  • See duplicates, use `.GroupBy(k => k.Key, StructuralComparisons.StructuralEqualityComparer)`. – CodeCaster Sep 24 '20 at 13:14
  • var duplicates = fileHashList.GroupBy(x => x.Key).Where(x => x.Count() > 1).ToList(); – jdweng Sep 24 '20 at 13:14
  • @jdweng try reading the question and the comments. This will invoke the default equality comparer, which will not yield the same hash for arrays with the same contents. That code does not do any grouping, all keys will be reported as being different. – CodeCaster Sep 24 '20 at 13:15
  • @CodeCaster So do I need mandatorily need to override the `GetHashCode()` method or just over riding the `Equals()` method is enough? Also the reason I'm having byte array as key because those are hashcode and I wish to check if multiple files are same or not. – m_beta Sep 24 '20 at 13:18
  • You can't override those for an array, you need to provide a custom comparer, see my other comment. – CodeCaster Sep 24 '20 at 13:21
  • @CodeCaster I used your other comment but I get `cannot convert from 'System.Collections.IEqualityComparer' to 'System.Collections.Generic.IEqualityComparer'` – m_beta Sep 24 '20 at 13:31
  • Yeah that's wrong, sorry. The quick fix would still be to create a custom comparer as in the first duplicate's accepted answer. – CodeCaster Sep 24 '20 at 13:33
  • @CodeCaster I created a custom comparer but I'm struggling to use it as I wish. I need to find all the duplicates as asked in the question – m_beta Sep 24 '20 at 13:40
  • @CodeCaster Can you just write the usage line for me as per my need as marked in your duplicate – m_beta Sep 24 '20 at 13:41
  • `var duplicates = fileHashList.GroupBy(x => x.Key, new ArrayComparer()).Where(x => x.Count() > 1).ToList();` – CodeCaster Sep 24 '20 at 13:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/222022/discussion-between-m-beta-and-codecaster). – m_beta Sep 24 '20 at 14:11
  • No, [edit] your question to show what you have and why it doesn't work. It also won't hurt to have a [mre], where you initialize the list with some sample data. – CodeCaster Sep 24 '20 at 14:12
  • Sorry for bothering you, just a last query from my side. I would like to fetch all the duplicate records (key value pair) eg: [0,1], [0,2], [0,3] so since key is duplicate I would like to have all these 3 records fetched not just one Your query is fetching me only one record (with key only) I want all duplicate records with key value pair – m_beta Sep 24 '20 at 14:19
  • @CodeCaster I've updated my question as you asked. Please reply on my last comment – m_beta Sep 24 '20 at 14:25
  • You haven't added the relevant code. We don't have your files. A [mre] would contain a collection initialization with some example byte arrays and strings, and then the code you use to do the grouping. Anyhow, a GroupBy returns a collections of groupings, with `Key` being the key and its values (enumerable) the items that have the same key. So something like `foreach (var d in duplicates) { var filesWithSameHash = d.ToList(); }`. – CodeCaster Sep 24 '20 at 15:23

0 Answers0