I have written a web application with C# language, one of its features is file attachment, which can be with extensions such as video, photo, document, etc., these files are sometimes repetitive and sometimes they are bulky, which over time They take up a lot of space, but on the other hand, I want the program to work smarter so that it can recognize which files are duplicates, use previous information, and even report on frequently used files.
For example I wrote a extension method that combine file content and file MIME type and create unique string:
public static string CreateFileKey(this Stream file, string mimeType)
{
if (file is null || file.Length == 0)
throw new ArgumentNullException(nameof(file));
if (string.IsNullOrWhiteSpace(mimeType))
throw new ArgumentNullException(nameof(mimeType));
file.Seek(0, SeekOrigin.Begin);
using var hashAlgorithm = MD5.Create();
using var bufferedFile = new BufferedStream(file);
var hashedFile = hashAlgorithm.ComputeHash(bufferedFile);
var mimeTypeBytes = Encoding.ASCII.GetBytes(mimeType);
var trustedDataForHashing = mimeTypeBytes.Concat(hashedFile).ToArray();
var result = hashAlgorithm.ComputeHash(trustedDataForHashing);
return Convert.ToBase64String(result);
}
Now, I will first check whether a file has been saved with this key or not! We will decide to save the file later.
Is it a good solution to use one of the hash algorithms to generate a unique value for each file?