0

I was able to generate a list with my filenames:

allFiles =['https://myurl.com/something//something_01-01-2020.csv', 'https://myurl.com/something//something_01-02-2020.csv', 'https://myurl.com/something//something_01-03-2020.csv'...]

How could I find the filename with earliest date (within the file name) on this list and extract its date as variable?

Additional scenario: What if I have 00-00-0000.csv and 00-0000.csv on my list?

Baobab1988
  • 685
  • 13
  • 33
  • "its date" is ambiguous - creation date, last modified date, last accessed date? Look at https://stackoverflow.com/questions/237079/how-to-get-file-creation-modification-date-times-in-python – buran Sep 08 '20 at 11:28
  • 1
    @buran Those are for local files. The question is about URLs. An http HEAD request will be needed. Unless the intention is just to extract the date from the filename. – alani Sep 08 '20 at 11:29
  • @alani, yes, you are right. and also it may be that OP wants the date in the name :-) – buran Sep 08 '20 at 11:30
  • Baobab1988: Please can you edit the question to clarify whether you just want the date that is part of the filename, or whether you want the date when the server reports that the file was last actually updated. – alani Sep 08 '20 at 11:31

3 Answers3

1

try this,

from datetime import datetime

min(allFiles,
    key=lambda x: datetime.strptime(x.split('_')[1].replace('.csv', ''), "%d-%m-%Y"))
sushanth
  • 8,275
  • 3
  • 17
  • 28
  • Thanks! this is what I was looking for :) and would you be able to help also with a scenario where some of the files have _00-00-0000.csv and some _00-0000.csv format? – Baobab1988 Sep 08 '20 at 11:43
  • 1
    @Baobab1988 If you have another question then please ask another question, including all the information necessary to answer it, including whether the order is supposed to be dd-mm-yyyy or mm-dd-yyyy and how any items that omit the day are supposed to be sorted relative to those that contain it. – alani Sep 08 '20 at 11:49
0

Try this:

import re
from datetime import datetime

sorted(allFiles, key= lambda x: datetime.strptime(re.search("([0-9]{2}\-[0-9]{2}\-[0-9]{4})", x)[0], '%d-%m-%Y'))

it will sort your list based on date, so you can get what you want.

the first element is earliest.

Mehrdad Pedramfar
  • 10,941
  • 4
  • 38
  • 59
0

It seems that you want to extract the files based on the filename.

You can do:

min(allFiles, key=lambda x:x[-8:-4]+x[-11:-9]+x[-14:-12])

The lambda function here obtains a string such as '20200101' from the filename, by extracting the relevant parts and concatenating them in the correct order. Ordering by this string will produce date order.

This is based on the assumption that the filename adheres to a format of ending in dd-mm-yyyy.csv. If the dates were intended in fact to be mm-dd-yyyy.csv (not clearly specified in the question), then this would need to be changed to:

min(allFiles, key=lambda x:x[-8:-4]+x[-14:-12]x[-11:-9])
alani
  • 12,573
  • 2
  • 13
  • 23