0

I have a script which gets all the files from particular location.But I need to fetch the files which are lates. The script should give the latest files which are present at that location.

eg.I have a location at whcih there are some files named as below

DataLogs_20141125_AP.CSV   
DataLogs_20141125_UK_EARLY.CSV  
DataLogs_20141125_CAN.CSV  
DataLogs_20141125_US.CSV 
DataLogs_20141125_EUR.CSV  
DataLogs_20141125_US_2.CSV 
DataLogs_20141126_AP.CSV   
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV  
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV  
DataLogs_20141126_US_2.CSV

I want to fetch the files which are the latest. eg.the files matching "20141126" pattern are the latest ones.

I tried with match but it gives me all the files.

filematch ='DataLogs_2014_*.CSV'
Rohit
  • 848
  • 3
  • 15
  • 31
  • You will have to `datetime` in python here to compare dates.But the format of your date needs to be fixed for that to work so that it can be split and passed to `datetime` for comparison.See here http://stackoverflow.com/questions/8142364/how-to-compare-two-dates – vks Nov 27 '14 at 04:50
  • by the virtue of the ascii gods, if your text dates are in the format "YYYYMMDD", the data will sort correctly without the need to parse to datetime objects. – monkut Nov 27 '14 at 07:25

3 Answers3

2

You could do this:

  1. Get the latest date by splitting individual file names and taking the first element from reverse sorted.
  2. From the latest date, get all the files which contain latest date

    fileList = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV',  'DataLogs_20141125_CAN.CSV',  'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV',  'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
        'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV',  'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV',  'DataLogs_20141126_US_2.CSV']
    latest = sorted(map(lambda x:x.split('_')[1],fileList), reverse=True)[1]
    print filter(lambda x:x.find(latest)!=-1, fileList)
    

Output:

['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']
venpa
  • 4,268
  • 21
  • 23
0

You can do as follows:

data = """DataLogs_20141125_AP.CSV   
DataLogs_20141125_UK_EARLY.CSV  
DataLogs_20141125_CAN.CSV  
DataLogs_20141125_US.CSV 
DataLogs_20141125_EUR.CSV  
DataLogs_20141125_US_2.CSV 
DataLogs_20141126_AP.CSV   
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV  
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV  
DataLogs_20141126_US_2.CSV"""


print(list(fname for fname in data.split() if '20141126' in fname))

Gives:

['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']

For more general solution, i.e. the one that searchers for the latest date, you can do as @user3 recommends.

Marcin
  • 215,873
  • 14
  • 235
  • 294
0

You could also use itertools.groupby to group files by the date in the filename.

from itertools import groupby

file_list = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV',  'DataLogs_20141125_CAN.CSV',  'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV',  'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
    'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV',  'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV',  'DataLogs_20141126_US_2.CSV']

def group_key_func(value):
    """Function to pull out and return the key value to group by in the filename"""
    return value.split("_")[1]  # pulls out '20141126' in 'DataLogs_20141126_CAN.CSV'

newest_date, newest_files = sorted([(group_key, list(group)) for group_key, group in groupby(file_list, key=group_key_func)], reverse=True)[0]

Newest date, files result:

20141126: 
DataLogs_20141126_AP.CSV
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV
DataLogs_20141126_US_2.CSV
monkut
  • 42,176
  • 24
  • 124
  • 155