1

To secure uploaded image names, I'd like to strip out image's filenames from anything but string.ascii_letters , string.digits, dot and (one) whitespace.

So I'm wondering what is the best method to check a text against other characters?

Jand
  • 2,527
  • 12
  • 36
  • 66
  • 1
    so (one) whitespace is mandatory in a filename? – Pruthvi Raj Sep 04 '15 at 18:13
  • 1
    Do have a look at this [answer](http://stackoverflow.com/questions/29998052/deleting-consonants-from-a-string-in-python/29998062#29998062) which has around 8 different ways of checking and removing certain characters. – Bhargav Rao Sep 04 '15 at 18:15
  • @PruthviRaj, well just to avoid messy names, yes. – Jand Sep 04 '15 at 18:22
  • @BhargavRao no. but it is definitely useful. Thanks! – Jand Sep 04 '15 at 18:32
  • @Paulrx Yep. I just added the link to show you a few ways. It certainly does not answer your question. Glad you found it useful. Cheers. – Bhargav Rao Sep 04 '15 at 18:34
  • What are you trying to secure against? Why not randomly generate the file names, and possibly store a mapping of real names to file paths in your database? – Colonel Thirty Two Sep 04 '15 at 19:33

3 Answers3

4
import re
import os
s = 'asodgnasAIDID12313%*(@&(!$ 1231'
result = re.sub('[^a-zA-Z\d\. ]|( ){2,}','',s )
if result =='' or os.path.splitext(result)[0].isspace():
    print "not a valid name"
else:
    print "valid name"

EDIT:

changed it so it will also whitelist only one whitespace + added import re

DorElias
  • 2,243
  • 15
  • 18
  • Very concise. To complete the answer just add `import re` . Thanks! – Jand Sep 04 '15 at 18:40
  • Sorry I just found a caveat. If you choos a utf8 filename with several white spaces, the final name will be something like `' '` which is not desireable. Any ideas about this? – Jand Sep 04 '15 at 18:45
  • if i understand you correctly than it needs to be empty, if so than now this should work ( i replaced th ( )+ double space with + to ( ){2,} be atleast to space so if there is an uneven number of spaces this should still work) – DorElias Sep 04 '15 at 18:50
  • 1
    the problem with this solution is that is tha if you enter non-ascii filenames, like `s = '些 些.jpg'` the result will be `' .jpg'` which is not good. – Jand Sep 04 '15 at 19:05
  • well this is a problem with your requirement as this is a single space witch answers the rules you specify, but in this case i would have added a special case to handle this – DorElias Sep 04 '15 at 19:10
1

Not sure if it's what you need but give it a try:

import sys, os

fileName, fileExtension = os.path.splitext('image  11%%22.jpg')
fileExtension = fileExtension.encode('ascii', 'ignore')
fileName = fileName.encode('ascii', 'ignore')
if fileExtension[1:] in ['jpg', 'jpeg', 'png', 'gif', 'bmp', 'tiff', 'tga']:
    fileName = ''.join(e for e in fileName if e.isalnum())
    print fileName+fileExtension
    #image1122.jpg
else:
    print "Extension not supported"

isalnum()

https://docs.python.org/2/library/stdtypes.html#str.isalnum

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

I wouldn't use regex for this. The only tricky requirement is the single space, but that can be done, too.

import string

whitelist = set(string.ascii_letters + string.digits)
good_filename = "herearesomelettersand123numbers andonespace"
bad_filename = "symbols&#! and more than one space"

def strip_filename(fname, whitelist):
    """Strips a filename

    Removes any character from string `fname` and removes all but one
    whitespace.
    """

    whitelist.add(" ")

    stripped = ''.join([ch for ch in fname if ch in whitelist])
    split = stripped.split()
    result = " ".join([split[0], ''.join(split[1:])])
    return result

Then call it with:

good_sanitized = strip_filename(good_filename, whitelist)
bad_sanitized = strip_filename(bad_filename, whitelist)
print(good_sanitized)
# 'herearesomelettersand123numbers andonespace'
print(bad_sanitized)
# 'symbols andmorethanonespace'
Adam Smith
  • 52,157
  • 12
  • 73
  • 112