2

I need a way to determine if a user has copied a file to a new location.

Example: You have two computers and you copy file.txt from C:\Temp\ on computer1 to C:\Temp\ on computer2.

Is there an ID associated with these two files, based on their location, that will help me determine if this file has moved?

Update: After some discussion, here is the resulting code. This determines if a file has been copied by creating a Guid using the file path and creation time. This resulting Guid can be compared to a stored Guid to determine if the file has been copied.

FileInfo fi = new FileInfo("C:\\Temp\\temp.txt"); 
string filePathCreationComposite = String.Format("{0}{1}", Path.GetFullPath(fi.FullName), fi.CreationTime); 

using (MD5 md5 = MD5.Create()) 
{ 
   byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(filePathCreationCo‌​mposite)); 
   Guid result = new Guid(hash); 
}
jtoddcs
  • 60
  • 1
  • 9
  • 1
    I think, you can get a unique `Hash code` of your file path to compare them later. – Paviel Kraskoŭski Oct 28 '16 at 15:43
  • why not just compare the paths for equality? – KMoussa Oct 28 '16 at 15:56
  • Because C:\Temp\File1.txt is equivalent to C:\Temp\File1.txt. This file has been copied from computer1 to the same directory and name as computer2. I need to know that this file has been copied and is effectively in a new location. @Pavieł Kraskoŭski I think your solution will work for what I need. Do you know how to get that unique hash code for the file path? – jtoddcs Oct 28 '16 at 16:00
  • The only way to do what you say I can think of is if you are handling those copies via your application. If not, I don't think the OS is going to keep track of all the copies over a file at all... – jorgonor Oct 28 '16 at 16:04
  • @jtoddcs - That isn't the way the code snippet works. It creates an unique hash for , and it doesn't regard the file content at all. So you could drop a corrupted file on computer2 and wouldn't notice it. Also, is language dependent, if your target computer has a different locale it would create a different hash even if the creation date is the same. Not a good idea in my opinion. Calculate the hash from the file content and store it along with the path to determine if the file is the same (and if there wasn't any transmission error corrupting the file). – Matt Nov 08 '22 at 12:29

2 Answers2

2

If you cannot control the process of the file copy (i.e. know 100% that a user does this under your control, e.g. inside your UI), the only possible way of checking this is compare the contents. As far as I know there are no special trustworthy file ID especially across various machines that even stays the same so that you can compare it.

This is of course not a guarantee that a user copied a file from A to B, since the file with exact same content can be copied from anywhere else, but I think you really need only to verify that the two files in the two different locations are the same.

One easy way to compare the content of a file (and to store it's "id") is to calculate a hash. SHA-256 will fit ideally for that purpose.

This is the example of doing it:

string Hash(string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        SHA256Managed sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(stream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

Though you don't need a string of course. This is not a very efficient way, you can read more about it here and here. In the first question they suggest that for some reason SHA-512 is faster to calculate, don't know if it's true and this is.

P.S. It's worth noting that if there is a simplest change in the original file, the hashes won't match anymore.

P.P.S. You can use any faster hash like MD5 or some other one (MD5 is not cryptographically secure, but this should not matter for your case).

Matt
  • 25,467
  • 18
  • 120
  • 187
Ilya Chernomordik
  • 27,817
  • 27
  • 121
  • 207
  • Note: This computes a hash based on the content of the file, not based on the path. So you can compare files at different path locations via their hash sum - which is better than the approach to hash the path itself (as suggested by the question). – Matt Nov 08 '22 at 09:38
-2

You can do the following:

  • Check if sizes are different. If so, the file has changed.
  • If size has not changed, check if creation date/time or modified date/time is different between the two files. If so, the new file is likely different. See https://learn.microsoft.com/en-us/dotnet/api/system.io.fileinfo?view=net-7.0. This is the same method most ftp programs use when uploading (i.e. Filezilla). May be good enough for your use case. If you want more, continue to the next point.
  • For extra thoroughness, you could compute the MD5 of the two files. This might be overkill but would be an absolute guarantee of knowing if the files have changed. See https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.md5?view=net-7.0. MD5 is very fast to compute and highly unlikely to have collisions. You could upgrade to SHA-256 but that would be slower to compute and not likely to get you any benefit given that you already did a size check and date/time comparison.
  • Unless you have files that are measured in gigabytes for size, MD5 will be very fast.

I will say that using the date/time method is a good way of knowing if a file is newer. If for some reason an older version of a file got placed in the new computer folder, the MD5 compare or size compare would just tell you that it's different, not that it's an older version.

As long as you are MOVING files around, the creation date is usable to determine if a file is newer. If you are COPYING files, you must include the size check followed by an MD5 check for files that are the same size. See https://superuser.com/questions/1748898/how-to-copy-file-folder-and-preserve-their-creation-date-on-windows-10.

Hope this helps :)

jjxtra
  • 20,415
  • 16
  • 100
  • 140
  • 1
    I really like this idea. I think it will solve my problem to create a composite guid based on the changed date and the file path. `FileInfo fi = new FileInfo("C:\\Temp\\temp.txt"); string filePathCreationComposite = String.Format("{0}{1}", Path.GetFullPath(fi.FullName), fi.CreationTime); using (MD5 md5 = MD5.Create()) { byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(filePathCreationComposite)); Guid result = new Guid(hash); }` – jtoddcs Oct 31 '16 at 22:33
  • @jtoddcs thanks. If it works out would love an up vote and accepted answer ;) – jjxtra Nov 01 '16 at 14:11
  • @jjxtra - This isn't a good way to go. – Enigmativity Nov 08 '22 at 09:38
  • @Enigmativity Would love some insight as to why it's not a good way to go – jjxtra Mar 07 '23 at 17:04
  • @jjxtra - I'm trying to think why it was a bad idea when I looked at it, but the key thing that jumps out at me is that Creation Date gets updated when you copy a file anyway, so you need to use Modified Date. Also, it's probably a good idea to do a hash of the first `x` number of bytes of the file and only hash the entire file if the initial hash collides. – Enigmativity Mar 07 '23 at 20:58