0

I've used glob.glob to list files in a directory.

files = glob.glob("C:/Folder/*.csv")

I want to narrow that list to only list files modified in the last 60 days. I'm not sure how to do that. Stumbled across os.path.getmtime() on google but I can't work out how to apply that to files = glob.glob("C:/Folder/*.csv")

Any ideas?

Josh Fox
  • 91
  • 1
  • 7
  • use the os package to get details on the files os.path.getmtime(path) or something.. – Rob Jul 14 '21 at 12:05
  • 1
    `glob()` will return a list of paths, you apply `getmtime()` on each and filter the old ones out. – bereal Jul 14 '21 at 12:05
  • Pass each file from the list *files* into **os.path.getmtime()**. This would give us the file modification date, which could be used in `time/datetime` module to compare to a date that is 60 days ago. – Vasu Deo.S Jul 14 '21 at 12:10

3 Answers3

5

Building on what you already provided and what you already know with os.path.getmtime(), you can use the time.time() function to get the current time. You can substract the modified time from the current time to get the time difference in seconds. I use (60*60*24) to get this to days.

The following code does each of those steps:

import glob
import os
import time

files = glob.glob("C:/Folder/*.csv")
modified_files = list()
current_time = time.time()

for csv_file in files:
    time_delta = current_time - os.path.getmtime(csv_file)
    time_delta_days = time_delta / (60 * 60 * 24)
    if time_delta_days < 60:
        modified_files.append(csv_file)

print(modified_files)

Edit: A more pythonic way to write this might be:

import glob
import os
import time

def test_modified(filename):
    delta = time.time() - os.path.getmtime(filename)
    delta = delta / (60*60*24)
    if delta < 60:
        return True
    return False

mfiles = [mfile for mfile in glob.glob("C:/Folder/*.csv") if test_modified(mfile)]
print(mfiles)
JiyuuSensei
  • 524
  • 5
  • 11
  • Thank you. This is where my brain was trying to lead me but I couldn't work it out due to my lack of experience. The crucial things you did, that I have learnt, were that I could make an empty list and then append the items which met the days criteria from the files list to that empty list. I was trying to find a function to help me remove items from the files list. – Josh Fox Jul 14 '21 at 15:50
  • Another mistake I made when trying to solve the problem was that I had done the equivalent of ```os.path.getmtime(files)```rather than ```os.path.getmtime(csv_files)``` and I couldn't work out why I kept getting errors. – Josh Fox Jul 14 '21 at 15:50
  • The way you handled the date was also much more elegant than I had thought about trying. It makes total sense to just work with seconds and convert to days, than datetimes, which are messy. Also, thankyou for writing the more pythonic way. My brain is not quite wrapped around this one or defining functions in general, but as example specific to me it should be a great learning tool. Thank you. – Josh Fox Jul 14 '21 at 15:52
  • Glad I could help. Yes, datetimes can be a pain sometimes but on the other hand, a useful tool to convert many different formats to datetime up front and then handle them all in the same manner. Here it wasn't necessary as the tools you were already using were both in the same format. Lucky! The edit I made consists of two things: a list comprehension, and a function. The function just checks if a given file is modified within the last 60 days or not. It returns either `True` or `False`. We can use this as a condition in list comprehensions, as it just returns a boolean. – JiyuuSensei Jul 15 '21 at 07:34
  • Great answer! One question though. Imagine you have a folder with thousands of files saved there over the past 20years. Is there a better way to get the files modified over the past for example 60 days without looping through each single file? – Angelo Jan 25 '22 at 19:29
  • @Angelo, yes you can by utilising [this answer](https://stackoverflow.com/questions/23430395/glob-search-files-in-date-order). You can start by using `glob.glob(pathname)` to get a list of all path names. Afterwards, you sort it on the modified time. Then you go through your loop as usual, but you can break out of the loop when your 60 day limit has been reached. || **EDIT:** although I'm uncertain whether or not the sort would still count as a loop, as it's calculating the modified time for every file in order to do the sort? So actually I think I might be wrong here. – JiyuuSensei Feb 02 '22 at 13:54
  • 1
    Thanks for answer. I would just change IF block by ternary one. Like this: `return True if delta < 60 else False` – Konstantin May 15 '22 at 12:46
1

Example with pathlib module:

from pathlib import Path
import time

folder = Path(r"d:\temp")
files = list(folder.glob("*.csv"))
sixty_days_ago = time.time() - (60*60*24) * 60

fresh_files = [f for f in files if f.stat().st_mtime > sixty_days_ago]

for f in fresh_files: print(f)
Yuri Khristich
  • 13,448
  • 2
  • 8
  • 23
-1

You could use sort to filter the elements in the folder by last modification time:

files = glob.glob(folder_path)

files.sort(key=os.path.getmtime, reverse=True)

In this way you'll get the list of items in the folder ordered by last modification time.

fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
Roberta
  • 1
  • 2