0

I am trying to hash video files in order to get a list of duplicates. I have looked here here and here, which is where I got some of the code. But for some reason, my method breaks at this line.

byte[] hash = md5.ComputeHash(fs);

I have tried changing the method, doing garbage collection manually, and substituting md5.ComputeHash() with HashAlgorithm.ComputeHash() with no luck. Here is my code :

Main Class Code

        Console.WriteLine("Please enter a directory path :");
        string path = Console.ReadLine();

        loadFiles load = new loadFiles(path, "video");
        videoFiles video = new videoFiles(path);

        video.removeDuplicates(load.files);
        Console.WriteLine("Done");

Class that loads file into arrays

    private List<string> videoExt = new List<string>() { ".mp4", ".avi", ".mkv", ".srt", ".t" };
    private string filetype;

    public loadFiles(string path, string filetype)
    {
        this.path = path;
        this.filetype = filetype;
        getFiles();
        getDirectories();
    }

    public FileInfo[] getFiles()
    {
        DirectoryInfo d = new DirectoryInfo(path);

        if (filetype == "audio")
        {
            files = d.GetFiles("*", SearchOption.AllDirectories).Where(x => audioExt.Contains(x.Extension)).ToArray();
        }
        else if (filetype == "video")
        {
            files = d.GetFiles("*", SearchOption.AllDirectories).Where(x => videoExt.Contains(x.Extension)).ToArray();
        }

        return files;
    }

Method that searches for duplicates and adds them to list

    public void removeDuplicates(FileInfo[] files)
    {
        List<byte[]> hashes = new List<byte[]>();
        List<string> duplicates = new List<string>();

        foreach (FileInfo file in files)
        {
            using (FileStream fs = file.OpenRead())
            {
                using (MD5 md5 = MD5.Create())
                {
                    byte[] hash = md5.ComputeHash(fs);

                    if (hashes.Contains(hash))
                        duplicates.Add(file.FullName);
                    else
                        hashes.Add(hash);
                }
            }
        }
TH3SN3R
  • 47
  • 9
  • 1
    What do you mean "breaks"? How do you know that it breaks? Have you tried debugging? What happens? – Panagiotis Kanavos Sep 29 '17 at 11:20
  • BTW you *don't* need to dispose the hashing class inside the loop. That just wastes memory and CPU. Create and dispose the instance outside the loop – Panagiotis Kanavos Sep 29 '17 at 11:23
  • I stepped through the code, and saw that the second time it comes to the computehash line, it automatically steps out of the method and returns to the console window. In the second iteration, it skips the if statements and doesnt continue iteration – TH3SN3R Sep 29 '17 at 11:23
  • 1
    Note that `hashes.Contains(hash)` will always return `false`, because by default `byte[]` is compared by reference, not by its contents. – C.Evenhuis Sep 29 '17 at 11:24
  • Either you have `catch {}` somewhere that hides the error that *is* thrown or there are no other files in the `files` array. There's nothing wrong with MD5 or `ComputeHash()`. Post a **reproducible** example – Panagiotis Kanavos Sep 29 '17 at 11:25
  • No repro. Running this code works without problems and fills the `hashes` list – Panagiotis Kanavos Sep 29 '17 at 11:28
  • The fileinfo array is filled with 797 video files after providing the path. But it nevers passes the 2nd file. My code containts no try/catches anywhere. Will post all used code now – TH3SN3R Sep 29 '17 at 11:30
  • By changing `hashes.Contains(hash)` to `hashes.Any(b=>b.SequenceEqual(hash))` I'm able to find duplicate entries. – Panagiotis Kanavos Sep 29 '17 at 11:31
  • @TH3SN3R debugging doesn't*"automatically steps out of the method". Either there are no more files, or there was an error. *Where* did you get the files from? How many are there? *Do* they exist? – Panagiotis Kanavos Sep 29 '17 at 11:32
  • The files are inserted into the FileInfo array by the above getFiles() method, which uses a path gotten from user input. And yes the files do exists, there are 797 files ranging from avi and mp4 to srt filetypes – TH3SN3R Sep 29 '17 at 11:36
  • I'm not talking about the files on disk. How many items does the `files` array contain? *Can* you print out their names from that array? Can you access their *streams*, or are they in use and throwing exceptions? What you described until now means that either you don't retrieve any files, or that you get an exception that you never log anywhere. *The code works* if the array comes from a simple `Directory.GetFiles` – Panagiotis Kanavos Sep 29 '17 at 11:50
  • I checked and saw, that the first iteration which completes is done on a .t file, and the file after that is a video file which ends the iteration, so i removed all non video files and saw that the iteration doesn't even start, although it see's all the files. Could it be that the video files are to large, or the filestream can't open them? The array contains all of the items that all present in the directory, and yes I can print their names into a console. – TH3SN3R Sep 29 '17 at 11:50
  • No, it can be that there's a problem with the code, when it runs on *your* machine. Maybe the video is opened somewhere else. You can't open a stream on a file if it's locked by another process. Maybe you have another error. Add an exception block and *log* the exception – Panagiotis Kanavos Sep 29 '17 at 11:52
  • No exceptions are caught. Can I make a screen recording give you a link to see exactly what I am seeing? – TH3SN3R Sep 29 '17 at 11:56

0 Answers0