157

In Python, what is a good, or the best way to generate some random text to prepend to a file(name) that I'm saving to a server, just to make sure it does not overwrite. Thank you!

Óscar López
  • 232,561
  • 37
  • 312
  • 386
zallarak
  • 5,287
  • 7
  • 38
  • 54

15 Answers15

189

You could use the UUID module for generating a random string:

import uuid
filename = str(uuid.uuid4())

This is a valid choice, given that an UUID generator is extremely unlikely to produce a duplicate identifier (a file name, in this case):

Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • 23
    this is also very useful when you want a unique filename, but don't want it created just yet. – Prof. Falken May 14 '13 at 08:44
  • 25
    Or use `uuid.uuid4().hex` to get an hex string without dashes (`-`). – Rockallite Nov 06 '15 at 08:25
  • This is also a useful alternative to NamedTemporaryFile() if you are blocked from privaleges on C drive for e.g. a work computer. This was the case for me but the uuid method worked as it's just a random string you can save in the local folder – user3376851 May 05 '22 at 14:36
151

Python has facilities to generate temporary file names, see http://docs.python.org/library/tempfile.html. For instance:

In [4]: import tempfile

Each call to tempfile.NamedTemporaryFile() results in a different temp file, and its name can be accessed with the .name attribute, e.g.:

In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'

In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'

Once you have the unique filename it can be used like any regular file. Note: By default the file will be deleted when it is closed. However, if the delete parameter is False, the file is not automatically deleted.

Full parameter set:

tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])

it is also possible to specify the prefix for the temporary file (as one of the various parameters that can be supplied during the file creation):

In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'

Additional examples for working with temporary files can be found here

Levon
  • 138,105
  • 33
  • 200
  • 191
  • 1
    Would those files get deleted next time I restart my machine? – ABCD Jan 26 '16 at 04:41
  • 32
    The problem with this solution is that it generates not only a file name, but also a file that is already open. If you need a temporary file name for a new, not yet existing file (e.g., to use as output of an os command), this will not do. In that case, you can do something like str(uuid.uuid4()) . – Luca Jul 23 '16 at 18:20
  • @Luca Thanks for the additional comment, that is useful, and noted for future reference. However, OP clearly stated that he/she wanted to save a file, hence need to open it, so this solution provides for that. – Levon Jul 23 '16 at 18:27
  • It depends. Perhaps he needs the name to construct an appropriate server call. Not sure. At any rate your reply is certainly the more common case. – Luca Jul 24 '16 at 00:25
28

a common approach is to add a timestamp as a prefix/suffix to the filename to have some temporal relation to the file. If you need more uniqueness you can still add a random string to this.

import datetime
basename = "mylogfile"
suffix = datetime.datetime.now().strftime("%y%m%d_%H%M%S")
filename = "_".join([basename, suffix]) # e.g. 'mylogfile_120508_171442'
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
  • 4
    In a multi-threaded environment, there's a possible race condition involved in the sequence `1. Test if file exists, 2. create file.` If another process interrupts yours between steps 1 and 2, and creates the file, when your code resumes it will overwrite the other process' file. – Li-aung Yip May 08 '12 at 15:36
  • @Li-aungYip In addition can also use 6-8 [random character sequence](http://stackoverflow.com/questions/2257441/python-random-string-generation-with-upper-case-letters-and-digits/2257449#2257449) (in case 2 files are generated in the same second). – bobobobo Apr 09 '13 at 17:50
  • @bobobobo: Or you could use the `tempfile` module, which handles this for you. :) – Li-aung Yip Apr 10 '13 at 02:57
  • I'd suggest to add microseconds i.e. `...strftime("%y%m%d_%H%M%S%f")` – AstraSerg Oct 31 '19 at 20:21
12

The OP requested to create random filenames not random files. Times and UUIDs can collide. If you are working on a single machine (not a shared filesystem) and your process/thread will not stomp on itself, use os.getpid() to get your own PID and use this as an element of a unique filename. Other processes would obviously not get the same PID. If you are multithreaded, get the thread id. If you have other aspects of your code in which a single thread or process could generate multiple different temp files, you might need to use another technique. A rolling index can work (if you aren't keeping them so long or using so many files you would worry about rollover). Keeping a global hash/index to "active" files would suffice in that case.

So sorry for the longwinded explanation, but it does depend on your exact usage.

Brad
  • 11,262
  • 8
  • 55
  • 74
11

If you need no the file path, but only the random string having predefined length you can use something like this.

>>> import random
>>> import string

>>> file_name = ''.join(random.choice(string.ascii_lowercase) for i in range(16))
>>> file_name
'ytrvmyhkaxlfaugx'
4xy
  • 3,494
  • 2
  • 20
  • 35
9

If you want to preserve the original filename as a part of the new filename, unique prefixes of uniform length can be generated by using MD5 hashes of the current time:

from hashlib import md5
from time import localtime

def add_prefix(filename):
    prefix = md5(str(localtime()).encode('utf-8')).hexdigest()
    return f"{prefix}_{filename}"

Calls to the add_prefix('style.css') generates sequence like:

a38ff35794ae366e442a0606e67035ba_style.css
7a5f8289323b0ebfdbc7c840ad3cb67b_style.css
niczky12
  • 4,953
  • 1
  • 24
  • 34
Aleš Kotnik
  • 2,654
  • 20
  • 17
  • 1
    To avoid: Unicode-objects must be encoded before hashing I changed to md5(str(localtime()).encode('utf-8')).hexdigest() – PhoebeB Aug 04 '17 at 12:17
  • 1
    Note that a hash of any kind of data (including a timestamp) does not ensure uniqueness by itself (any more than a randomly chosen byte sequence does). – Peter O. Jun 24 '20 at 20:51
4

Adding my two cents here:

In [19]: tempfile.mkstemp('.png', 'bingo', '/tmp')[1]
Out[19]: '/tmp/bingoy6s3_k.png'

According to the python doc for tempfile.mkstemp, it creates a temporary file in the most secure manner possible. Please note that the file will exist after this call:

In [20]: os.path.exists(tempfile.mkstemp('.png', 'bingo', '/tmp')[1])
Out[20]: True
happyhuman
  • 1,541
  • 1
  • 16
  • 30
3

As date and time both change after each second so you need to concatenate data-time with uuid (Universally Unique Identifiers) here is the complete code for your answer

   import uuid
   imageName = '{}{:-%Y%m%d%H%M%S}.jpeg'.format(str(uuid.uuid4().hex), datetime.now())
Asad Farooq
  • 191
  • 1
  • 13
  • 2
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – rizerphe Jun 29 '20 at 12:36
1

I personally prefer to have my text to not be only random/unique but beautiful as well, that's why I like the hashids lib, which generates nice looking random text from integers. Can installed through

pip install hashids

Snippet:

import hashids
hashids = hashids.Hashids(salt="this is my salt", )
print hashids.encode(1, 2, 3)
>>> laHquq

Short Description:

Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.

user1767754
  • 23,311
  • 18
  • 141
  • 164
1
>>> import random
>>> import string    
>>> alias = ''.join(random.choice(string.ascii_letters) for _ in range(16))
>>> alias
'WrVkPmjeSOgTmCRG'

You could change 'string.ascii_letters' to any string format as you like to generate any other text, for example mobile NO, ID... enter image description here

Freman Zhang
  • 491
  • 6
  • 6
1
import random

def Generate(): #function generates a random 6 digit number
    code = ''
    for i in range(6):
        code += str(random.randint(0,9))
    return code

print(Generate()+".txt")

1

In some other cases if you need the random file name to be sensible, use the faker module. This will produce "sensible" file names with common extension. This method might have name collision after some time. I think prepend with uuid is probably better.

pip install faker

Then,

from faker import Faker

fake = Faker()
for _ in range(10):
    print(fake.file_name())

Link to faker documentation: https://faker.readthedocs.io/en/master/index.html

Kheng
  • 114
  • 6
1

I found considerable benefit to the following solution.

I needed to store hundreds of thousands of images in a directory on a remote server. There was an issue with space on the server we were using and many of these images would often be duplicates of each other, so I found a handy solution is using the SHA of the image to generate the filename, and store the file using that name. If it overwrites a file, that means that file was a duplicate so it's okay that it gets overwritten. SHA256 is considered to be physically incapable of producing collisions, so you can use it forever without worry.

import hashlib

image_path = '/tmp/my_image.png'
with open(image_path, 'rb') as f:
    image_bytes = f.read()

file_name = f'{hashlib.sha256(image_bytes).hexdigest()}.png'
with open(filename, 'wb') as ff:
    ff.write(image_bytes)

This method is only useful if the following apply to your situation.

  1. You do not care about file creation date.
  2. You do not rely on duplicate files for your system to work.
  3. You do not need to search through the images using their names (that would suck)

This method will also be of little benefit to you if the files you are saving almost certainly will never produce the same SHA, but it can still be useful if you are saving several thousand files and need to guarantee that none of them overwrite(besides duplicates).

destent
  • 118
  • 6
0

I am not sure why no-one mentioned this, but one can get the time down to microseconds, so, one solution is:

image_name = datetime.datetime.now().strftime('%Y_%m_%d_%H-%M-%S.%f')
image_name = image_name+".jpeg" #or whatever extension is needed

The output would look something like:

2023_03_21_11-08-08.734965.jpeg

This has the advantage that files can be ordered by time and/or date since the ordering would be the same.

user
  • 2,015
  • 6
  • 22
  • 39
-3

You could use the random package:

import random
file = random.random()
anajem
  • 85
  • 2