5
<input id='f' name='f' multiple />

I am allowing the user to select multiple files (from different folders) for upload. I build the list of selected files as described here. Basically the list of selected files is maintained outside of the INPUT control and stuffed back into f.files at submit time.

The list of selected files is built from the File object in the array f.files each time the user selects a file(s).

So far, this works well, except that I am unable to detect duplicate files selected. f.files contains only the file name and not the full path.

I applied URL.createObjectURL on the File object but each time it is a different url, even with the same file.

(In Chrome, the upload control will not fire the change event if the same file was selected in succession. But this isn't sufficient for me, as the user can select file A, then file B, and then file A again.)

How do I identify duplicate files from the File object?

Old Geezer
  • 14,854
  • 31
  • 111
  • 198
  • Can you post some of the code you have. For example the `onChange` function etc. – Akrion Aug 30 '18 at 04:36
  • Possible workaround: Check file name / size / lastModified properties and compare them to identify as possible duplicate. Comparing BLOB objects (URL.createObjectURL) as an additional filter you can use. – jeetaz Aug 30 '18 at 05:38

1 Answers1

0

You can use the FileReader.readAsDataURL() to read the contents of each file.

Then you can compare the file contents along with the other properties of each file, including File.lastModified, File.name, File.size, and File.type to determine if the file is a duplicate.

Full Example:

const file_input = document.getElementById( 'f' );

file_input.addEventListener( 'change', () =>
{
    const file_compare = [];
    
    Array.from( file_input.files ).forEach( file =>
    {
        const file_reader = new FileReader();
        
        file_reader.readAsDataURL( file );
        
        file_reader.addEventListener( 'load', () =>
        {
            const file_exists = file_compare.find( existing_file =>
                
                   existing_file.lastModified === file.lastModified
                && existing_file.name         === file.name
                && existing_file.size         === file.size
                && existing_file.type         === file.type
                && existing_file.content      === file_reader.result
                
            ) !== undefined;
            
            if ( file_exists )
            {
                console.log( 'Error:', file.name, 'is a duplicate file.' );
            }
            
            else
            {
                file_compare.push(
                {
                    'lastModified' : file.lastModified,
                    'name'         : file.name,
                    'size'         : file.size,
                    'type'         : file.type,
                    'content'      : file_reader.result
                });
            }
        });
    });
});
<input id="f" name="f" type="file" multiple />

Note: If you are dealing with very large files, you can limit the comparisons to File.lastModified, File.name, File.size, and File.type as a performance enhancer.

Grant Miller
  • 27,532
  • 16
  • 147
  • 165
  • Do the size & fileModified checks before going through the reader. There is **very** little chance that you'll get the same on two different files, and loading every file as dataURL will blow your Memory if you are dealing with gigs of data. – Kaiido Sep 03 '18 at 03:37