2

I have some xml files in a folder as example 'assests/2020/2010.xml', 'assests/2020/20005.xml', 'assests/2020/20999.xml' etc. I want to get the filename with max value in the '2020' folder. For above three files output should be 20999.xml

I am trying as following:


import glob
import os

list_of_files = glob.glob('assets/2020/*')
# latest_file = max(list_of_files, key=os.path.getctime)
# print (latest_file)

I couldn't be able to find logic to get the required file. Here is the resource that have best answer to my query but I couldn't build my logic.

Aven Desta
  • 2,114
  • 12
  • 27
solo
  • 71
  • 7
  • You can find your answer [here](https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory) – Aven Desta Dec 24 '20 at 12:59
  • Do you the filename that is lexicgraphically highest, or the one with the latest creation time (as your code suggests) or the one with the largest integer before `.xml`? Put another way, which do you consider higher, 20888 or 2090? – BoarGules Dec 24 '20 at 13:25

5 Answers5

1

You can use pathlib to glob for the xml files and access the Path object attributes like .name and .stem:

from pathlib import Path

list_of_files = Path('assets/2020/').glob('*.xml')
print(max((Path(fn).name for fn in list_of_files), key=lambda fn: int(Path(fn).stem)))

Output:

20999.xml
Terry Spotts
  • 3,527
  • 1
  • 8
  • 21
0

I can't test it out right now, but you may try this:

files = []
for filename in list_of_files:
    filename = str(filename)
    filename = filename.replace('.xml','') #Assuming it's not printing your complete directory path
    filename = int(filename)
    files += [filename]

print(files)

This should get you your filenames in integer format and now you should be able to sort them in descending order and get the first item of the sorted list.

Prateek Jain
  • 231
  • 1
  • 11
0

Use re to search for the appropriate endings in your file paths. If found use re again to extract the nr.

import re

list_of_files = [
    'assests/2020/2010.xml',
    'assests/2020/20005.xml',
    'assests/2020/20999.xml'
    ]

highest_nr = -1
highest_nr_file = ''
for f in list_of_files:
    re_result = re.findall(r'\d+\.xml$', f)
    if re_result:
        nr = int(re.findall(r'\d+', re_result[0])[0])
        if nr > highest_nr:
            highest_nr = nr
            highest_nr_file = f

print(highest_nr_file)

Result

assests/2020/20999.xml 
Mace
  • 1,355
  • 8
  • 13
0
import glob

xmlFiles = [] 
# this will store all the xml files in your directory
for file in glob.glob("*.xml"):
    xmlFiles.append(file[:4])

# this will print the maximum one
print(max(xmlFiles))
Aven Desta
  • 2,114
  • 12
  • 27
0

You can also try this way.

import os, re

path = "assests/2020/"

files =[
"assests/2020/2010.xml",
"assests/2020/20005.xml",
"assests/2020/20999.xml"
]

n = [int(re.findall(r'\d+\.xml$',file)[0].split('.')[0]) for file in files]


output = str(max(n))+".xml"
print("Biggest max file name of .xml file is ",os.path.join(path,output))

Output:

Biggest max file name of .xml file is  assests/2020/20999.xml
Ice Bear
  • 2,676
  • 1
  • 8
  • 24