226

I need to get the latest file of a folder using python. While using the code:

max(files, key = os.path.getctime)

I am getting the below error:

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
garlapak
  • 2,309
  • 3
  • 14
  • 12

10 Answers10

526

Whatever is assigned to the files variable is incorrect. Use the following code.

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)
oberbaum
  • 2,451
  • 7
  • 36
  • 52
Marlon Abeykoon
  • 11,927
  • 4
  • 54
  • 75
  • 4
    What if instead of a file I want to find the latest created/modified folder ? – lucians Sep 08 '17 at 15:36
  • 4
    @Link the same code works for that. If you want to check its a folder or not u can check `if os.path.isdir(latest_file): ` – Marlon Abeykoon Sep 11 '17 at 04:23
  • This part you added, where have to be put ? At the end of the code above or instead of it ? Thank you. – lucians Sep 11 '17 at 09:17
  • 1
    The `latest_file` variable can contain both dirs or files. So if you want to check the latest modified is a folder you can add the above if condition at the end of the script. so you can ignore if its a file and write the logic inside `if` to run if it's a folder. – Marlon Abeykoon Sep 11 '17 at 09:21
  • 9
    Weird. I had to use "min" to get the latest file. Some searching around hinted that it's os specific. – Graeck Dec 12 '17 at 23:53
  • 28
    This is an excellent answer--THANK YOU! I like to work with `pathlib.Path` objects more than strings and os.path. With pathlib.Path objects your answer becomes: ``list_of_paths = folder_path.glob('*'); latest_path = max(list_of_paths, key=lambda p: p.stat().st_ctime)`` – Phil Apr 25 '18 at 22:42
  • 1
    @MarlonAbeykoon I would suggest using `glob.iglob()` instead of the `glob.glob()`, as `glob.iglob()` Return an iterator which yields the same values as `glob()` without actually storing them all simultaneously. Which means `glob.iglob()` will be more efficient. Check below answer : [link](https://stackoverflow.com/a/50605125/7918560) – BreakBadSP Jul 02 '18 at 07:15
  • `glob.iglob()` will be more efficient, Please check answer https://stackoverflow.com/a/50605125/7918560 – BreakBadSP Oct 08 '18 at 05:34
  • 4
    @phil You can still use `os.path.getctime` as key, even with `Path` objects. – Berislav Lopac Nov 20 '18 at 14:11
  • 1
    Superb answer. really helpful – DeshDeep Singh Feb 25 '19 at 15:39
  • using the above, but improving lost_of_files to be OS agnostic with `.join` => `list_of_files = glob.glob(os.path.join('path/to/folder', '*'))` – D.L Jul 10 '20 at 13:51
  • @Graeck macOS 11.6 works smoothly with max() – Alex Zubkov Aug 18 '23 at 18:18
75
max(files, key = os.path.getctime)

is quite incomplete code. What is files? It probably is a list of file names, coming out of os.listdir().

But this list lists only the filename parts (a. k. a. "basenames"), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).

Such as (untested):

def newest(path):
    files = os.listdir(path)
    paths = [os.path.join(path, basename) for basename in files]
    return max(paths, key=os.path.getctime)
Community
  • 1
  • 1
glglgl
  • 89,107
  • 13
  • 149
  • 217
  • 3
    I am sure the downvoters can explain what exactly is wrong. – glglgl Sep 06 '16 at 11:36
  • 6
    Dunno, tested for you, it does seem to work. On top of that, you were the only one to care to explain a bit. Reading the accepted answer made me think that 'glob' thing was needed, whereas it's absolutely not. Thanks – Arnaud P Dec 13 '17 at 17:16
  • is there a way to only select certain type of files, such as CSV? – David Sep 26 '18 at 09:53
  • 5
    @David Of course. Just insert `if basename.endswith('.csv')` into the list comprehension. – glglgl Sep 26 '18 at 12:14
  • `glob.iglob()` will be more efficient and better way, Please check answer https://stackoverflow.com/a/50605125/7918560 – BreakBadSP Oct 08 '18 at 05:37
  • 1
    @BreakBadSP If you want flexibility, you are right. If you are restricted to a certain directory, I don't see how yours can possibly more efficient. But sometimes, readability is more important than efficiency, so yours might indeed be better in that sense. – glglgl Oct 08 '18 at 11:03
  • 2
    Thanks for this, I've used this in so many of my ETL functions! – Umar.H Jun 15 '19 at 19:57
  • I had such an error: File "/usr/lib/python3.6/genericpath.py", line 65, in getctime return os.stat(filename).st_ctime FileNotFoundError: [Errno 2] No such file or directory: 'train-00085-of-00096.tfrecord' – zheyuanWang Jun 20 '20 at 21:33
  • @zheyuanWang Then you did something wrong. (Sorry, the vagueness of my answer matches the vagueness in your question.) Best thing would be to open a separate question with more complete description of your code and the content of your variables at the time of code execution. – glglgl Jun 22 '20 at 08:43
  • using glob.glob function to replace os.listdir function( as in Marlon Abeykoon's answer) works well by me – zheyuanWang Jun 25 '20 at 20:42
29

I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))

import glob
import os

list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print(latest_file)

I found two answers for that problem:

python os.path.getctime max does not return latest Difference between python - getmtime() and getctime() in unix system

Pikamander2
  • 7,332
  • 3
  • 48
  • 69
crlf
  • 311
  • 3
  • 5
16

I've been using this in Python 3, including pattern matching on the filename.

from pathlib import Path

def latest_file(path: Path, pattern: str = "*"):
    files = path.glob(pattern)
    return max(files, key=lambda x: x.stat().st_ctime)
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
  • 2
    This would be even better if the max arg default was added to support no files matching the path/pattern - max (and min) raise ValueError in that situation so better to set a default - requires python 3.4+ – nickjb Jun 29 '22 at 20:18
14

I would suggest using glob.iglob() instead of the glob.glob(), as it is more efficient.

glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Which means glob.iglob() will be more efficient.

I mostly use below code to find the latest file matching to my pattern:

LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)


NOTE: There are variants of max function, In case of finding the latest file we will be using below variant: max(iterable, *[, key, default])

which needs iterable so your first parameter should be iterable. In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])

BreakBadSP
  • 820
  • 10
  • 21
  • 3
    I like this `max()` sort. In my case, I used a different `key=os.path.basename` since the filenames had timestamps in them. – MarkHu Dec 11 '19 at 18:35
  • In your example, if I want to include the folder path for the fileNamePattern, how to do it? – FMFF Feb 16 '23 at 17:34
6

Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.

import glob
import os

files_path = os.path.join(folder, '*')
files = sorted(
    glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print files[0]
turkus
  • 4,637
  • 2
  • 24
  • 28
5

Most of the answers are correct but if there is a requirement like getting the latest two or three latest then it could fail or need to modify the code.

I found the below sample is more useful and relevant as we can use the same code to get the latest 2,3 and n files too.

import glob
import os

folder_path = "/Users/sachin/Desktop/Files/"
files_path = os.path.join(folder_path, '*')
files = sorted(glob.iglob(files_path), key=os.path.getctime, reverse=True) 
print (files[0]) #latest file 
print (files[0],files[1]) #latest two files
Sachin
  • 1,460
  • 17
  • 24
3

A much faster method on windows (0.05s), call a bat script that does this:

get_latest.bat

@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%

where \\directory\in\question is the directory you want to investigate.

get_latest.py

from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)

if it finds a file stdout is the path and stderr is None.

Use stdout.decode("utf-8").rstrip() to get the usable string representation of the file name.

ic_fl2
  • 831
  • 9
  • 29
  • Not sure why this attracting down votes, for those that need to do this task quickly this is the fastest method I could find. And sometimes it is necessary to do this very quickly. – ic_fl2 Nov 01 '18 at 07:51
  • Have an upvote. I'm not doing this in Windows, but if you're looking for speed, the other answers require an iteration of all files in a directory. So if shell commands in your OS that specify a sort order of the listed files are available, pulling the first or last result of that *should* be faster. – Jim Hunziker Nov 08 '18 at 18:11
  • 1
    Thanks I'm actually more concerned with a better solution than this (as in similarly fast but pure python) so was hoping someone could elaborate on that. – ic_fl2 Nov 21 '18 at 08:00
  • 3
    Sorry, but I had to downvote, and I'll give you the courtesy of explaining reasons why. The biggest reason is that it is not using python (not cross-platform) thus broken unless ran under Windows. Secondly, this is not a "faster method" (unless faster means quick-and-dirty-not-bothering-to-read-docs) --shelling out to another script is notoriously slow. – MarkHu Dec 11 '19 at 16:33
  • 1
    @MarkHu Actually this script was born out of the necessity to check a large folder's content quickly from a python script. So in this case faster method means, gets the file name of newest folder the fastest (or faster than a pure python method). Feel free to add a similar script for linux, probably based on `ls -Art | tail -n 1`. Please evaluate the performance of a solution before making claims about it. – ic_fl2 Jan 17 '20 at 13:04
1

(Edited to improve answer)

First define a function get_latest_file

def get_latest_file(path, *paths):
    fullpath = os.path.join(path, paths)
    ...
get_latest_file('example', 'files','randomtext011.*.txt')

You may also use a docstring !

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)

If you use Python 3, you can use iglob instead.

Complete code to return the name of latest file:

def get_latest_file(path, *paths):
    """Returns the name of the latest (most recent) file 
    of the joined path(s)"""
    fullpath = os.path.join(path, *paths)
    files = glob.glob(fullpath)  # You may use iglob in Python3
    if not files:                # I prefer using the negation
        return None                      # because it behaves like a shortcut
    latest_file = max(files, key=os.path.getctime)
    _, filename = os.path.split(latest_file)
    return filename
Naeem Ul Wahhab
  • 2,465
  • 4
  • 32
  • 59
1

I have tried to use the above suggestions and my program crashed, than I figured out the file I'm trying to identify was used and when trying to use 'os.path.getctime' it crashed. what finally worked for me was:

    files_before = glob.glob(os.path.join(my_path,'*'))
    **code where new file is created**
    new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))

this codes gets the uncommon object between the two sets of file lists its not the most elegant, and if multiple files are created at the same time it would probably won't be stable

AlexFink
  • 41
  • 4