1

How can I detect that the name of a file that a user has provided for upload (via a django.forms.ModelForm using a FileField field) is a duplicate of one that exists, and thus decide to fail validation on the form?

I'm finding this particularly challenging, because from within the form, I don't see how I can find out what the value of upload_to is for this FileField, so I can't go looking myself in the file system to see if that file is there already.

GreenAsJade
  • 14,459
  • 11
  • 63
  • 98
  • Note: I want to prevent the duplicate file being uploaded with a differentiated name, which is what django would do if I don't check. – GreenAsJade Dec 01 '14 at 11:47
  • maybe some checksum like md5, http://stackoverflow.com/questions/5055143/will-changing-a-file-name-affect-the-md5-hash-of-a-file – madzohan Dec 01 '14 at 12:05
  • @madzohan Thanks, right - so that will help determine if the files are the same, once I have two files to compare, but how do I find out whether there is a pre-existing file in the filesystem already that the name in the form will overwrite, in the first place? – GreenAsJade Dec 01 '14 at 12:19
  • Create migration to your data with new column `checksum` and I think you don't need to submit that files with form, better when input changed run ajax to some view where you have to compare ... and there is something similar http://stackoverflow.com/questions/15885201/django-uploads-discard-uploaded-duplicates-use-existing-file-md5-based-check – madzohan Dec 01 '14 at 13:02

2 Answers2

1

As i see it you have 2 options:

Set a value in your settings.py to hold your 'upload_to' and then use it to check when you are validating. Something like this to verify would work (you need to change your upload_to ofc):

from django.conf import settings

if settings.UPLOAD_TO:
    # Do something

Issue with that is that you can't have subfolders or anything complex there.

A second option would be, as mentioned in your comments, to add a new column to your model that holds a hash for your file. This approach should work better. As someone mentioned in your comments, to avoid uploading a big file, checking, failing, uploading another big file, etc, you can try to hash it in the client and verify it via ajax first (you will verify it again in the server, but this can make things go faster for your users).

cdvv7788
  • 2,021
  • 1
  • 18
  • 26
  • Can I clarify something? Having a hash for the file is (AIUI) a means to detect whether the file contents are the same. But my problem is "simpler" ... I just need to detect whether the user is trying to upload a file of the same name as one already uploaded. I do have subfolders in upload_to: upload_to is a callable that puts the files in a subfolder by uploader name. Right now I'm not concerned with failing a big file after its uploaded - as long as I can prevent a duplicate upload. – GreenAsJade Dec 01 '14 at 13:52
  • So, keep it simple. Put the names of your files in the new column (i guess it would be added to user model, something like user_files), comma separated and then compare that in validation. – cdvv7788 Dec 01 '14 at 13:55
  • Would I need to explicitly search, something like `if Thingoes.objects.get(file_name = this_file_nane): raise ValidationError()` ? – GreenAsJade Dec 01 '14 at 14:02
  • Check the docs for validation errors (https://docs.djangoproject.com/en/dev/ref/forms/validation/#raising-validationerror), other than that it should work. – cdvv7788 Dec 01 '14 at 14:06
  • Starting to look simple, and promising. Is there a way to ask "Are there any Thingoes who's file_attachment.name == this_file_name?" and thus avoid the extra column duplicating the file name? – GreenAsJade Dec 01 '14 at 14:09
  • If you have your callable using separate folders for users, you may user the filefield url (https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.fields.files.FieldFile.url). Your file structure would be in charge of differentiating users and that would be a very trivial comparison (you would need to build the url from scratch for the non saved file tho, but that should not be hard) – cdvv7788 Dec 01 '14 at 14:16
  • Downvoted as this doesn't seem to use Django standard practices or conventions. – paul Sep 13 '17 at 23:15
1

Older question, but Django 1.11 now supports the unique option on FileField. Set unique=True on your field declaration on your model.

It shouldn't matter what you are setting upload_to to. The file name will still be stored in the database.

Changed in Django 1.11: In older versions, unique=True can’t be used on FileField.

https://docs.djangoproject.com/en/1.11/ref/models/fields/#unique

paul
  • 1,132
  • 11
  • 12