0

I am creating a program that will grab files from a Result Directory (Original Folder) and move those files to a Working Directory (Another Folder). Here the file's name is being changed when the file is moved from one directory to another. Before I move them, I need to check that the Working Directory does not contain that file I am trying to move already. Since the file name is being changed, I need something that will check if the file exists already based on the content inside of the file.

Let's say I have:

FilesRD - (The files in the Original Folder/Result Directory)
FilesWD - (The files in the Other Folder/Working Directory)

and the files inside of those directories will look like this...

Before (In Result Directoy):
Log_123.csv

After (In Working Directory):
Log_123_2015_24_6.csv

kjhughes
  • 106,133
  • 27
  • 181
  • 240
JP Garza
  • 212
  • 3
  • 16
  • 2
    If the files are identical you could get a hash\crc of the file contents and compare them. Quite a broad question for SO tbh – Martin Jun 24 '15 at 15:00
  • I imagine, you will need file content comparison, between all the files in target directory and current file. This could take a lot of time depending on the number of files and size of the files. This could be a starting point. [How to create a File-Compare function in Visual C#](https://support.microsoft.com/en-us/kb/320348) – Habib Jun 24 '15 at 15:00
  • I came here to say the same as @MartinParkin,.. you can try hashing the content of the file.. and the results are the same.. then you´ll now. – fabricio Jun 24 '15 at 15:01
  • If it is a single program you have access to all aspects of it I assume (or do you use external stuff?). Can't you keep track of what files have been moved already or at least what is the new name of the file? – T_D Jun 24 '15 at 15:02
  • Does the file in the working directory always start with the original name? Which program does the renaming? – CodeCaster Jun 24 '15 at 15:02
  • If only your software can change files name, and then if you follow some common naming convention, you can easily discover the old file name (in your example by removing the date)? – SimonGates Jun 24 '15 at 15:02
  • The file in the Working Directory does not always start with the same name, but it could. So I added the UTCtime so I can differ them if that ever happens. And I'll look into "hashing." Thanks. – JP Garza Jun 24 '15 at 15:17
  • A deterministic mapping between the filenames would be best, or otherwise a lookup table somewhere. If you do use hashes, see if you can keep them around, as generating a hash still must read a file in its entirety. If that's not possible, and depending on the content and size of your files, you may want to read files piece by piece, comparing the pieces against the original file as you go. This allows you to bail out as soon as you find a difference, which can be more efficient (unless files only differ near their ends). – Pieter Witvoet Jun 24 '15 at 15:32

3 Answers3

1

I would try the following function it's far from being perfect:

    private bool CheckIfFileAlreadyExist(string WorkingDirectory, string FileToCopy)
    {
        string FileToCheck = File.ReadAllText(FileToCopy);
        foreach (string CurrentFile in Directory.GetFiles(WorkingDirectory))
        {
            if (File.ReadAllText(CurrentFile) == FileToCheck)
                return true;
        }
        return false;
    }

UPDATE:

Another way is to read out the ByteArray this would solve the Image Problem. But the function still get's slow over time.

    private bool CheckIfFileAlreadyExist(string WorkingDirectory, string FileToCopy)
    {
        byte[] FileToCheck = File.ReadAllBytes(FileToCopy);
        foreach (string CurrentFile in Directory.GetFiles(WorkingDirectory))
        {
            if (File.ReadAllBytes(CurrentFile) == FileToCheck)
                return true;
        }
        return false;
    }
Xanatos
  • 26
  • 2
  • 1
    Although this appears to solve the OPs problem, doing a textual comparison of the files may be exceptionally slow. You'd be better off creating a hash of the file contents and comparing it. – Martin Jun 24 '15 at 15:06
  • 1
    out of curiousity, what happens if the files are images? – user1666620 Jun 24 '15 at 15:08
  • 1
    @user1666620 This would fail as it assumed that they are text. Hence the use of a hash :) – Martin Jun 24 '15 at 15:08
  • You'll want to use `Enumerable.SequenceEqual` here instead of `==`, because you want to compare their content, not their identity. – Pieter Witvoet Jun 24 '15 at 15:26
1

you need to check in the destination folder using system.io namespace for example:

string destination = "c:\myfolder\"; 
string [] files   Directory.GetFiles(destination , "Log_123");
if(files.Length == 0)
{
   //move the file to the directory
}

you can add pattern to the getfiles function, only if it found file match to the pattern it's return it.

Proxytype
  • 712
  • 7
  • 18
  • the query is to find duplicate file content, not file names – user1666620 Jun 24 '15 at 15:09
  • This does not ascertain whether the file __contents__ are the same which is what the OP asked for. – Martin Jun 24 '15 at 15:09
  • if you want to check by content you need to create unique MD5 for each file and compare between the new md5 if there is a match so it mean that content already exists... http://stackoverflow.com/questions/10520048/calculate-md5-checksum-for-a-file – Proxytype Jun 24 '15 at 15:13
0

Try comparing the files hashes. Here's an example:

        private static string GetFileMD5(string fileName) {
            using (var md5 = MD5.Create()) {
                using (var fileStream = File.OpenRead(fileName)) {
                    return BitConverter.ToString(md5.ComputeHash(fileStream)).Replace("-", "").ToLower();
                }
            }
        }
        private static bool DoesFileExist(string workingDir,string fileName) {
            var fileToCheck = GetFileMD5(fileName);
            var files = Directory.EnumerateFiles(workingDir);
            return files.Any(file => string.Compare(GetFileMD5(file), fileToCheck, StringComparison.OrdinalIgnoreCase) == 0);
        }
Jyrka98
  • 530
  • 1
  • 10
  • 19