6

I've been trying to work out the 'best practices' way to manage file uploads with Turbogears 2 and have thus far not really found any examples. I've figured out a way to actually upload the file, but I'm not sure how reliable it us.

Also, what would be a good way to get the uploaded files name?

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        file.filename.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = self.request.params["file"].filename 
    permanent_file.close()

So assuming I'm understanding correctly, would something like this avoid the core 'naming' problem? id = UUID.

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        id.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = file.filename
    permanent_file.close()
William Chambers
  • 499
  • 2
  • 13
  • 1
    Yes, using `uuid1().hex` or `uuid4().hex` will resolve the naming issue and most security issues. You don't need to call `lstrip()` on the uuid (don't use `id` as a variable name - it masks the builtin `id()`). So, use `uuid` to generate a unique name and copy the uploaded data to a file of that name in your upload directory. If you need to store the user supplied file name, save this as meta data, perhaps in your database. There are additional security issues that you'll face, several of which are described here: http://www.scanit.be/uploads/php-file-upload.pdf – mhawke Mar 08 '10 at 00:57
  • Aye, thanks for your help mhawke. Great read. Outta curiosity, is there any real reason to use uuid4().hex over just uuid4()? The former generates a slightly more human friendly UUID for using in URL's and the like. (I'm thinking of generating two UUID's, one for the download URL and a second for the actual filename/ID. – William Chambers Mar 08 '10 at 08:28
  • Other than the length of the filename, `uuid4().hex` vs. `str(uuid4())` is more or less the same. I don't see any advantage in having independent UUIDs for URL and filename as you'll now need an extra layer to map the UUID in the URL to an actual file. – mhawke Mar 08 '10 at 23:17

6 Answers6

3

I just want anyone who comes here looking for answers, to know that Allesandro Molina's great library Depot constitutes the best answer to this question.

It solves both the naming and copying issues, and it will incorporate nicely into your TurboGears application. You can use it with MongoDB GridFS, as in this example:

from depot.manager import DepotManager

# Configure a *default* depot to store files on MongoDB GridFS
DepotManager.configure('default', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})

depot = DepotManager.get()

# Save the file and get the fileid
fileid = depot.create(open('/tmp/file.png'))

# Get the file back
stored_file = depot.get(fileid)
print stored_file.filename
print stored_file.content_type

or you can easily create attachment fields in your SQLAlchemy models, like:

from depot.fields.sqlalchemy import UploadedFileField

class Document(Base):
    __tablename__ = 'document'

    uid = Column(Integer, autoincrement=True, primary_key=True)
    name = Column(Unicode(16), unique=True)

    content = Column(UploadedFileField)

… and then, storing documents with attached files (the source can be a file or bytes) becomes as easy as:

doc = Document(name=u'Foo', content=open('/tmp/document.xls'))
DBSession.add(doc)

Depot supports both LocalFileStorage, MongoDB's GridFSStorage, and Amazon's S3Storage. And, at least for files stored locally and in S3, the fileid will be generated by uuid.uuid1().

Martin Thorsen Ranang
  • 2,394
  • 1
  • 28
  • 43
2

I don't know much about Turbogears and whether it can provide anything to avoid the following, but it seems to me that this code is fraught with danger. It may be possible for a malicious user to overwrite (or create) any file that the Turbogears python process has write access to.

What if asset_dirname is /tmp, the contents of file.filename is ../../../../../../../etc/passwd and the contents of the file root::0:0:root:/root:/bin/bash? In a UNIX environment this code (permissions pending) would open the file /tmp/../../../../../../../etc/passwd in truncate mode and then copy the contents of the uploaded file to it - effectively overwriting your system's password file and specifying a root user without a password. Presumably there are nasty things that can be done to a Windows machine too.

OK, this is an extreme example that requires that python is running as root (no one does that, do they?). Even if python is running as a low-priveleged user, previously uploaded files could be overwritten at will.

To summarise, don't trust user input, in this case the user supplied filename that is available in file.filename.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Aye, I realize that it's a risky system. I'm not really sure with how python usually handles files (Moving from a PHP background so it's a bit of a hard change, although I like python.) That's why I'm hoping someone who's designed a better Turbogears upload system would know and help out. And thanks, that's a useful first step. :) – William Chambers Mar 04 '10 at 02:53
  • I think that turbogears provides a set of useful controls called Tosca Widgets and that there is a widget for file uploads. I'd advise looking into that as you'll hopefully end up with something that standard (wrt turbogears) and which will hopefully avoid the kinds of vulnerabilities that I've pointed out in your example. – mhawke Mar 04 '10 at 03:41
  • 1
    Thank you 'very' much for that idea. It looks like Tosca Widgets does indeed have a 'file field' field. Their docs are somewhat confusing but I've found this so far. http://toscawidgets.org/documentation/tw.forms/modules/fields/basic_fields.html#filefield – William Chambers Mar 04 '10 at 04:05
2

@mhawke - you're right you have to handle that - depends on what you are doing with the file, if it doesn't matter if there is a name collision eg you only care for the latest version of some data then theres probably no issue, or if the filename isn't actually important just the file contents, but its still bad practice.

You could use a named tempfile in a tmp dir, then move the file once validated to its final location. Or you could check the filename doesn't already exist like so:

file.name = slugify(myfile.filename)
name, ext = os.path.splitext(file.name)
while os.path.exists(os.path.join(permanent_store, file.name)):
    name += '_'
    file.name = name + ext

raw_file = os.path.join(permanent_store, file.name)

The slugify method would be used to tidy up the filename...

Ross
  • 17,861
  • 2
  • 55
  • 73
  • 1
    Now we're getting into race conditions. There will be an interval of time between checking whether the file already exists and actually creating it. What if 2 users upload a file of the same name at the same time? `os.path.exists()` could return `False` in both instances, then when the files are created, one file will overwrite the other. – mhawke Mar 07 '10 at 23:54
0

Isn't turbogears just pylons with extras? You could check out the help there:

http://wiki.pylonshq.com/display/pylonsdocs/Form+Handling#file-uploads

However, that still contains the potential security flaw that mhawke mentioned:

os.path.join(permanent_store, myfile.filename.lstrip(os.sep))

Same as above really if filename somehow was ../../../../../etc/passwd then you could replace that file...

So you could just get the actual filename like so:

os.path.join(permanent_store, myfile.filename.split(os.sep).pop())
Ross
  • 17,861
  • 2
  • 55
  • 73
  • With `os.path.join(permanent_store,myfile.filename.split(os.sep).pop())` There's still the problem of trusting user supplied data. What if users uploaded files that had the same basename? If the basename collides, previsouly uploaded files would still be overwritten. – mhawke Mar 05 '10 at 00:01
  • So would a workable solution be to strip all forward slashes from the filename? – William Chambers Mar 05 '10 at 01:49
  • Well the pop just uses the filename and ignores the path and stops the security flaw in accessing outside your temp location. – Ross Mar 05 '10 at 08:05
0

Werkzeug has a very good helper function for securing file names called secure_filename. I think you can adopt and use it.

Mengu
  • 514
  • 5
  • 21
-1

about how to go, i'd second the good answers already given.

here's my 2 pence about the stored file naming.

indeed saving the file using the original name may cause a vulnerability. the only use i make of the original name, if at all, is to hint the mime type detection.

anyway the files to be saved should be given unique names, by a record identity or something similar, and kept in a place under the control of the app directory owner, who is a regular user, or in some other storage service, as the aforementioned depot etc.

it's a matter of cross language good system design :).

alex
  • 651
  • 1
  • 9
  • 11