45

I'm creating a program that will create a file and save it to the directory with the filename sample.xml. Once the file is saved when i try to run the program again it overwrites the old file into the new one because they do have the same file name. How do I increment the file names so that whenever I try to run the code again it will going to increment the file name. and will not overwrite the existing one. I am thinking of checking the filename first on the directory and if they are the same the code will generate a new filename:

fh = open("sample.xml", "w")
rs = [blockresult]
fh.writelines(rs)
fh.close()
ndmeiri
  • 4,979
  • 12
  • 37
  • 45
Oliver Ven Quilnet
  • 461
  • 1
  • 4
  • 4

15 Answers15

73

I would iterate through sample[int].xml for example and grab the next available name that is not used by a file or directory.

import os

i = 0
while os.path.exists("sample%s.xml" % i):
    i += 1

fh = open("sample%s.xml" % i, "w")
....

That should give you sample0.xml initially, then sample1.xml, etc.

Note that the relative file notation by default relates to the file directory/folder you run the code from. Use absolute paths if necessary. Use os.getcwd() to read your current dir and os.chdir(path_to_dir) to set a new current dir.

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
bossi
  • 1,629
  • 13
  • 17
  • 4
    Kindly asking what is non-useful or unconstructive here? Voting down without leaving a (constructive) comment seems to be more unconstructive to me. – bossi Aug 01 '13 at 05:18
  • `isfile()` is not correct: a directory will match. You want `exists()` instead, but this is @Eiyrioü von Kauyf's answer. Furthermore, relative paths are not exactly "relative to the directory where the code is run from". They are instead more generally relative to the "current directory" (which is by default the directory that the code is run from). The current directory can be changed within the program, for instance. – Eric O. Lebigot Aug 01 '13 at 05:24
  • The fact that os.path.isfile() matches directories is new to me (and doesn't do as you describe for me on Python 3.3/win), isn't that why there is os.path.isdir() in place to differentiate between the two? In regards to the comment in my post towards the relative path notation neither Oliver Ven Quilnet's nor my example explicitly changes the _current directory_ and I thought I briefly point it out to make it clear _for the given context_. – bossi Aug 01 '13 at 05:48
  • 2
    You are right, I should have been clearer. I meant that `isfile()` will make your loop exit when the name matches a directory, and your code tries then to open the directory in write mode, which fails with `IOError`. This is why `isfile()` is not the correct test, and should be replaced by the `exists()` of @Eiyrioü von Kauyf. As for relative paths, I really think that the current "the relative file notation always relates to the file directory/folder you run the code from" is misleading (because of "always"). – Eric O. Lebigot Aug 01 '13 at 06:41
  • @EOL: That's a good point, I honestly wasn't aware that identical names between a file and a folder in the same directory are illegal under Windows; thanks for pointing that out. I agree with you, the remark about the relative path did sound misleading, it should sound clearer now. – bossi Aug 01 '13 at 07:52
  • Yeah, having a file and a folder with the same name is illegal in any file system I know of (NTFS, HFS+, ext3,…). I simplified the use of the Python formatting operator in your answer. – Eric O. Lebigot Aug 01 '13 at 08:18
26

Sequentially checking each file name to find the next available one works fine with small numbers of files, but quickly becomes slower as the number of files increases.

Here is a version that finds the next available file name in log(n) time:

import os

def next_path(path_pattern):
    """
    Finds the next free path in an sequentially named list of files

    e.g. path_pattern = 'file-%s.txt':

    file-1.txt
    file-2.txt
    file-3.txt

    Runs in log(n) time where n is the number of existing files in sequence
    """
    i = 1

    # First do an exponential search
    while os.path.exists(path_pattern % i):
        i = i * 2

    # Result lies somewhere in the interval (i/2..i]
    # We call this interval (a..b] and narrow it down until a + 1 = b
    a, b = (i // 2, i)
    while a + 1 < b:
        c = (a + b) // 2 # interval midpoint
        a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)

    return path_pattern % b

To measure the speed improvement I wrote a small test function that creates 10,000 files:

for i in range(1,10000):
    with open(next_path('file-%s.foo'), 'w'):
        pass

And implemented the naive approach:

def next_path_naive(path_pattern):
    """
    Naive (slow) version of next_path
    """
    i = 1
    while os.path.exists(path_pattern % i):
        i += 1
    return path_pattern % i

And here are the results:

Fast version:

real    0m2.132s
user    0m0.773s
sys 0m1.312s

Naive version:

real    2m36.480s
user    1m12.671s
sys 1m22.425s

Finally, note that either approach is susceptible to race conditions if multiple actors are trying to create files in the sequence at the same time.

James
  • 3,597
  • 2
  • 39
  • 38
  • 1
    Note that this code seems to have some float/int confusion and has been putting extra periods in my filenames (e.g. `file-6.0.txt` instead of `file-6.txt`). I like the principle of this answer, though. – Giselle Serate Jul 18 '19 at 18:11
  • 6
    Thanks @GiselleSerate, it looks like Python 3 handles integer division differently to Python 2. I've updated the code to use the `//` operator instead of `/` which seems to fix the problem. – James Jul 22 '19 at 06:14
16
def get_nonexistant_path(fname_path):
    """
    Get the path to a filename which does not exist by incrementing path.

    Examples
    --------
    >>> get_nonexistant_path('/etc/issue')
    '/etc/issue-1'
    >>> get_nonexistant_path('whatever/1337bla.py')
    'whatever/1337bla.py'
    """
    if not os.path.exists(fname_path):
        return fname_path
    filename, file_extension = os.path.splitext(fname_path)
    i = 1
    new_fname = "{}-{}{}".format(filename, i, file_extension)
    while os.path.exists(new_fname):
        i += 1
        new_fname = "{}-{}{}".format(filename, i, file_extension)
    return new_fname

Before you open the file, call

fname = get_nonexistant_path("sample.xml")

This will either give you 'sample.xml' or - if this alreay exists - 'sample-i.xml' where i is the lowest positive integer such that the file does not already exist.

I recommend using os.path.abspath("sample.xml"). If you have ~ as home directory, you might need to expand it first.

Please note that race conditions might occur with this simple code if you have multiple instances running at the same time. If this might be a problem, please check this question.

Community
  • 1
  • 1
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
6

Try setting a count variable, and then incrementing that variable nested inside the same loop you write your file in. Include the count loop inside the name of the file with an escape character, so every loop ticks +1 and so does the number in the file.

Some code from a project I just finished:

numberLoops = #some limit determined by the user
currentLoop = 1
while currentLoop < numberLoops:
    currentLoop = currentLoop + 1

    fileName = ("log%d_%d.txt" % (currentLoop, str(now())))

For reference:

from time import mktime, gmtime

def now(): 
   return mktime(gmtime()) 

which is probably irrelevant in your case but i was running multiple instances of this program and making tons of files. Hope this helps!

zanetu
  • 3,740
  • 1
  • 21
  • 17
ford
  • 180
  • 1
  • 3
  • 11
  • 2
    Python has for loops for this, they are much faster to read and comprehend than the while loops that emulate them. Furthermore, the `%` operator is deprecated. No downvote, though, because it does the job—it just does not do it in the preferred Python way. – Eric O. Lebigot Aug 01 '13 at 05:19
  • There is a problem with your format string: you format a string with `%d`, and this raises an exception. – Eric O. Lebigot Aug 01 '13 at 05:31
  • Thanks for catching that. It should be a %s, I retyped this rather hastily instead of copying from my source. Thanks! – ford Aug 01 '13 at 05:36
3

The two ways to do it are:

  1. Check for the existence of the old file and if it exists try the next file name +1
  2. save state data somewhere

an easy way to do it off the bat would be:

import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
    filenum+=1
my_next_file = open(filename+str(filenum)+".py",'w')

as a design thing, while True slows things down and isn't a great thing for code readability


edited: @EOL contributions/ thoughts

so I think not having .format is more readable at first glance - but using .format is better for generality and convention so.

import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
    filenum+=1
my_next_file = open("{}{}.py".format(filename, filenum),'w')
# or 
my_next_file = open(filename + "{}.py".format(filenum),'w')

and you don't have to use abspath - you can use relative paths if you prefer, I prefer abs path sometimes because it helps to normalize the paths passed :).

import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(filename+str(filenum)+".py"):
    filenum+=1
##removed for conciseness
Eiyrioü von Kauyf
  • 4,481
  • 7
  • 32
  • 41
  • The `format()` method is much more legible than string concatenation, here. I think that the while loop is fine, here. On another topic, why use `abspath()`? – Eric O. Lebigot Aug 01 '13 at 05:23
  • format is more legible, but then he would have to look at string formatting; this is easier to understand on first glance imho. and abspath because i'm ignoring symlinks :/ .... that could lead to confusing errors – Eiyrioü von Kauyf Aug 01 '13 at 05:26
  • While I understand your point, I believe that even beginners should be shown Pythonic examples, so that they take good habits. The behavior of `format()` is really quite simple to understand and even guess: `"{}{}.py".format(filename, filenum)`. It's even simpler than the algorithm presented here. :) – Eric O. Lebigot Aug 01 '13 at 05:29
  • @EOL whatcha think ;) do I have your approval – Eiyrioü von Kauyf Aug 01 '13 at 05:34
3

Another solution that avoids the use of while loop is to use os.listdir() function which returns a list of all the files and directories contained in a directory whose path is taken as an argument.

To answer the example in the question, supposing that the directory you are working in only contains "sample_i.xlm" files indexed starting at 0, you can easily obtain the next index for the new file with the following code.

import os

new_index = len(os.listdir('path_to_file_containing_only_sample_i_files'))
new_file = open('path_to_file_containing_only_sample_i_files/sample_%s.xml' % new_index, 'w')
Malo Pocheau
  • 101
  • 4
  • 1
    While this won't handle skipped numbers well, as long as that's not a concern, this is a brilliantly simple way to achieve the goal. – David Parks May 19 '19 at 01:35
  • Yes, provided the files in the given directory aren't ever going to change (which may produce unwanted side effects), this is an excellent answer – Mick McCarthy Mar 21 '22 at 10:22
3

You can use a while loop with a counter which checks if a file with a name and the counter's value exists if it does then move on else break and make a file.

I have done it in this way for one of my projects:`

from os import path
import os

i = 0
flnm = "Directory\\Filename" + str(i) + ".txt"
while path.exists(flnm) :
    flnm = "Directory\\Filename" + str(i) + ".txt"
    i += 1
f = open(flnm, "w") #do what you want to with that file...
f.write(str(var))
f.close() # make sure to close it.

`

Here the counter i starts from 0 and a while loop checks everytime if the file exists, if it does it moves on else it breaks out and creates a file from then you can customize. Also make sure to close it else it will result in the file being open which can cause problems while deleting it. I used path.exists() to check if a file exists. Don't do from os import * it can cause problem when we use open() method as there is another os.open() method too and it can give the error. TypeError: Integer expected. (got str) Else wish u a Happy New Year and to all.

typedecker
  • 1,351
  • 2
  • 13
  • 25
2

Without storing state data in an extra file, a quicker solution to the ones presented here would be to do the following:

from glob import glob
import os

files = glob("somedir/sample*.xml")
files = files.sorted()
cur_num = int(os.path.basename(files[-1])[6:-4])
cur_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()

This will also keep incrementing, even if some of the lower numbered files disappear.

The other solution here that I like (pointed out by Eiyrioü) is the idea of keeping a temporary file that contains your most recent number:

temp_fh = open('somedir/curr_num.txt', 'r')
curr_num = int(temp_fh.readline().strip())
curr_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()
Vorticity
  • 4,582
  • 4
  • 32
  • 49
1

Another example using recursion

import os
def checkFilePath(testString, extension, currentCount):
    if os.path.exists(testString + str(currentCount) +extension):
        return checkFilePath(testString, extension, currentCount+1)
    else:
        return testString + str(currentCount) +extension

Use:

checkFilePath("myfile", ".txt" , 0)
chumbaloo
  • 671
  • 6
  • 16
1

I needed to do something similar, but for output directories in a data processing pipeline. I was inspired by Vorticity's answer, but added use of regex to grab the trailing number. This method continues to increment the last directory, even if intermediate numbered output directories are deleted. It also adds leading zeros so the names will sort alphabetically (i.e. width 3 gives 001 etc.)

def get_unique_dir(path, width=3):
    # if it doesn't exist, create
    if not os.path.isdir(path):
        log.debug("Creating new directory - {}".format(path))
        os.makedirs(path)
        return path

    # if it's empty, use
    if not os.listdir(path):
        log.debug("Using empty directory - {}".format(path))
        return path

    # otherwise, increment the highest number folder in the series

    def get_trailing_number(search_text):
        serch_obj = re.search(r"([0-9]+)$", search_text)
        if not serch_obj:
            return 0
        else:
            return int(serch_obj.group(1))

    dirs = glob(path + "*")
    num_list = sorted([get_trailing_number(d) for d in dirs])
    highest_num = num_list[-1]
    next_num = highest_num + 1
    new_path = "{0}_{1:0>{2}}".format(path, next_num, width)

    log.debug("Creating new incremented directory - {}".format(new_path))
    os.makedirs(new_path)
    return new_path

get_unique_dir("output")
Woods26
  • 11
  • 2
0

Here is one more example. Code tests whether a file exists in the directory or not if exist it does increment in the last index of the file name and saves The typical file name is: Three letters of month_date_lastindex.txt ie.e.g.May10_1.txt

import time
import datetime
import shutil
import os
import os.path


da=datetime.datetime.now()

data_id =1
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime("%b%d")
data_id=str(data_id)
filename = st+'_'+data_id+'.dat'
while (os.path.isfile(str(filename))):
    data_id=int(data_id)
    data_id=data_id+1
    print(data_id)
    filename = st+'_'+str(data_id)+'.dat'
    print(filename)


shutil.copyfile('Autonamingscript1.py',filename)

f = open(filename,'a+')
f.write("\n\n\n")
f.write("Data comments: \n")


f.close()
0

Continues sequence numbering from the given filename with or without the appended sequence number.

The given filename will be used if it doesn't exist, otherwise a sequence number is applied, and gaps between numbers will be candidates.

This version is quick if the given filename is not already sequenced or is the sequentially highest numbered pre-existing file.

for example the provided filename can be

  • sample.xml
  • sample-1.xml
  • sample-23.xml
import os
import re

def get_incremented_filename(filename):
    name, ext = os.path.splitext(filename)
    seq = 0
    # continue from existing sequence number if any
    rex = re.search(r"^(.*)-(\d+)$", name)
    if rex:
        name = rex[1]
        seq = int(rex[2])
    
    while os.path.exists(filename):
        seq += 1
        filename = f"{name}-{seq}{ext}"
    return filename

david binette
  • 51
  • 1
  • 7
0

Use numbered_filename('sample-*.xml')

Python does not have a routine to find the next filename in a numbered sequence, so I wrote a simple module (see below). Usage is:

from numbered_filename import numbered_filename

fn = numbered_filename('sample-*.xml')
fh = open(fn, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()

The first time the code is run, the output will be to sample-000.xml. The next run will write to sample-001.xml, then sample-002.xml, and so on. Each subsequent run increments the sequence number by one.

The module code

Save the following code into a file called numbered_filename.py.

"""Provide a function for creating sequentially incremented filenames
based upon a simple template in which an asterisk is replaced with a
number. The filesystem is checked for existing files that match the
template and the returned filename's sequence number is always one
greater than the maximum found. 
"""

import glob
if __debug__:
    import os

def numbered_filename(template :str ='', width :int =3) -> str:
    """Return the next filename in an incrementing sequence by adding
    one to the current largest number in existing filenames.

    template :str: a string with an asterisk in it representing where
                   the numbers are placed. ('foo-*.txt').

       width :int: optional minimum number of digits to zero-pad the
                   sequence to. Defaults to 3 ('000', '001', '002', ...)

    Example usage:

        from numbered_filename import numbered_filename
        newfile = numbered_filename('foo-*.txt')
        with open(newfile, 'w') as outfile:
            outfile.write("Bob's your uncle!")

    Given a filename template with an asterisk in it, such as
    'foo-*.txt', returns the same filename with the asterisk replaced
    with the next number in the sequence, such as 'foo-007.txt'. If no
    prior file exists, numbering starts at zero ('foo-000.txt').

    The number will be left-padded with zeroes to contain at least
    three digits, unless the optional 'width' argument is given.
    Zero-padding can be disabled with 'width=0'. For example,
    'numbered_filename("hackerb*", width=0)' might return 'hackerb9'.
    Note that 'width' is a minimum and more digits will be used if
    necessary. (E.g., 'foo-1000.txt').

    Regardless of the 'width' setting, existing filenames need not be
    zero-padded to be recognized. For example, if a directory has the
    file 'foo-6.txt', the next filename will be 'foo-007.txt'.

    This routine always return the next higher number after any
    existing file, even if a lower number is available. For example,
    in a directory containing only 'foo-099.txt', the next file would
    be 'foo-100.txt', despite 'foo-000' through '-098.txt' being possible.

    Peculiar Circumstances: If the template is the empty string (''),
    then the output will simply be a sequence number ('007'). If the
    template contains no asterisks ('foo'), then the number is
    appended to the end of the filename ('foo007'). If more than one
    asterisk is used ('*NSYNC*.txt'), then only the rightmost asterisk
    is replaced with a number ('*NSYNC007.txt'). All others asterisks
    are kept as literal '*' in the filename.

    CAVEAT: While the code attempts to return an unused filename, it
    is not guaranteed as there is a fairly obvious race condition. To
    avoid it, processes writing to the same directory concurrently
    must not use the same template. Do not use this to create temp
    files in a directory where an adversary may have write access,
    such as /tmp -- instead use 'mkstemp'.
    """

    if not isinstance(template, str):
        raise TypeError("numbered_filename() requires a string as a template, such as foo-*.txt")

    (filename, asterisk, extension) =  template.rpartition('*')
    if not asterisk:
        (filename, extension) = (extension, filename)
        template=f'{filename}*'

    try:
        files = [int(f.lstrip(filename).rstrip(extension))
                 for f in glob.glob(template)
                 if f.lstrip(filename).rstrip(extension).isdigit()]
        num = sorted(files)[-1]
    except (IndexError, ValueError):
        num = -1

    num = num + 1
    spec = f'0>{width}'
    numstr = format(num, spec)

    if __debug__:
        result = filename + numstr + extension
        if os.path.exists(result):
            raise AssertionError(f'Error: "{result}" already exists. Race condition or bug?')

    return filename + numstr + extension

Race condition warning

This module solves the problem described in the question, however, it makes no pretense of being secure. If your program is creating temporary files in a directory which an adversary has write access to, such as /tmp, you should use mkstemp() instead of numbered_filename().

hackerb9
  • 1,545
  • 13
  • 14
0

I came across a similar task.
This is what I came up with to create unique filenames with an automatically determined, running number.

from pathlib import Path
from glob import glob

targetPath = Path('tmp').resolve() / 'targetFile.txt'
if cnt := len(glob(f"{targetPath.parent}/{targetPath.stem}*{targetPath.suffix}")):
    targetPath = Path(targetPath.parent / f"{targetPath.stem}_{cnt}{targetPath.suffix}")

with open(targetPath,"w") as f:
    ...
twil
  • 83
  • 7
-1

My 2 cents: an always increasing, macOS-style incremental naming procedure

  • get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir ; then
  • get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (1) ; then
  • get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (2) ; etc.

If ./some_new_dir (2) exists but not ./some_new_dir (1), then get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (3) anyways, so that indexes always increase and you always know which is the latest


from pathlib import Path
import re

def get_increased_path(file_path):
    fp = Path(file_path).resolve()
    f = str(fp)

    vals = []
    for n in fp.parent.glob("{}*".format(fp.name)):
        ms = list(re.finditer(r"^{} \(\d+\)$".format(f), str(n)))
        if ms:
            m = list(re.finditer(r"\(\d+\)$", str(n)))[0].group()
            vals.append(int(m.replace("(", "").replace(")", "")))
    if vals:
        ext = " ({})".format(max(vals) + 1)
    elif fp.exists():
        ext = " (1)"
    else:
        ext = ""

    return fp.parent / (fp.name + ext + fp.suffix)

ted
  • 13,596
  • 9
  • 65
  • 107
  • tried the code out using python 3.5, had a few bugs and also the results doesn't remove the file extention, it just adds the file extention to the whole filename. – Flying Turtle May 25 '20 at 07:16