I am using python to load a csv file for processing.
The directory contains many files and is constantly updated. When I run the script, I want it to select only the most recently updated csv file in the directory for processing.
I have code that seems to do this, but it doesn't do it reliably. Often it takes the last csv file, as intended, but sometimes it takes an older file and skips the most recent. I think it's probably sorting alpha-numerically instead of by created/updated time.
Can someone please suggest a change to the code that would make it work more reliably?
Current code:
# Import python modules
import pandas as pd
import os
#Identify last csv file in directory
last_csv = sorted(list(filter(lambda x: '.csv' in x, os.listdir())))[-1]
#load csv into a pandas dataframe
df = pd.read_csv(last_csv, skip_blank_lines=False, header=[8], engine='python')
I've seen bash and java versions in other threads, but is there a way to do it with python?
This thread describes how to sort a list of files, but it works by parsing the date and time from the filename. I want to be able to find the latest file even if the updated time is not part of the filename.
Python combining all csv files in a directory and order by date time
Thanks all