1

I'm developing a timelapse camera on a read-only filesystem which writes images on a USB stick, without real-time clock and internet connection, then I can't use datetime to maintain the temporal order of files and prevent overwriting.

So I could store images as 1.jpg, 2.jpg, 3.jpg and so on and update the counter in a file last.txt on the USB stick, but I'd rather avoid to do that and I'm trying to calculate the last filename at boot, but having 9.jpg and 10.jpg print(max(glob.glob('/home/pi/Desktop/timelapse/*'))) returns me 9.jpg, also I think that glob would be slow with thousands of files, how can I solve this?

EDIT

I found this solution:

import glob
import os
import ntpath
max=0
for name in glob.glob('/home/pi/Desktop/timelapse/*.jpg'):
    n=int(os.path.splitext(ntpath.basename(name))[0])
    if n>max:
        max=n
print(max)

but it takes about 3s every 10.000 files, is there a faster solution apart divide files into sub-folders?

Miky
  • 181
  • 1
  • 5
  • 15
  • `str(max([int(x.rstrip(".jpg") for x in glob.glob('/home/pi/Desktop/timelapse/*')]))+".jpg"` strips .jpg for each filename and then converts them to integer, then finds the max and then puts back .jpg. This assumes the filenames are only integers and all of them are .jpg files and there is nothing else other than them in the folder. If you have more relaxed assumptions, I can come up with more robust strategies. NOTE: this is by no means the fastest or best approach – D_Serg Jun 29 '19 at 16:39
  • Thousands files in one directory would be always slow. I would consider to separate the files to different directories. Maybe new directory each boot? – Andrej Kesely Jun 29 '19 at 16:40
  • @D_Serg the filenames are only integers and all of them are .jpg files and there is nothing else other than them in the folder, what would be the fastest and best approach? – Miky Jun 29 '19 at 17:02
  • @AndrejKesely it would be nice to make a directory a day but without clock I can't, why should it be a problem if ordered? – Miky Jun 29 '19 at 17:02
  • @Miky, is there a specific reason why you're avoiding `last.txt`? – D_Serg Jun 29 '19 at 17:14
  • @D_Serg just because overwriting the same file which is very likely to trigger wear leveling – Miky Jun 29 '19 at 17:20
  • @JohnHennig I've read that answer but in my case I can't use time methods as getctime – Miky Jun 29 '19 at 20:54

4 Answers4

2

Here:

latest_file_index = max([int(f[:f.index('.')]) for f in os.listdir('path_to_folder_goes_here')])

Another idea is just to use the length of the file list (assuming all fiels in the folder are the jpg files)

latest_file_index = len(os.listdir(dir))
balderman
  • 22,927
  • 7
  • 34
  • 52
  • it is not slow to iterate thousands files with listdir? – Miky Jun 29 '19 at 17:04
  • @Miky If you find it slow why dont you split the the data into folders where each folder represents a time period like 10 hours? – balderman Jun 29 '19 at 17:08
  • This doesn't work if there are characters in the filename. – Mark Moretto Jun 29 '19 at 17:11
  • 1
    @MarkMoretto That is right but the assumption is that the file names are 1.jpg,2,jpg...n.jpg so there is no real problem. – balderman Jun 29 '19 at 17:18
  • True and your solution works for that. I just assumed that OP would be adding characters at some point. I actually updated my post to just go by creation date (in Windows) instead of numerical filenames, which can be altered. – Mark Moretto Jun 29 '19 at 17:23
  • @balderman I get `Traceback (most recent call last): File "last1.py", line 8, in latest_file_index = max([int(f[:f.index('.')]) for f in os.listdir('/home/pi/Desktop/timelapse')]) ValueError: substring not found` – Miky Jun 29 '19 at 19:30
  • @Miki How about 'latest_file_index = len(os.listdir(dir))' ? Assuming all files in folder are the image files. – balderman Jun 29 '19 at 19:32
  • Miki. About the ValueError. It looks like there is a file that does not contain a dot. Is that true? – balderman Jun 29 '19 at 19:53
0

You need to extract the numbers from the filenames and convert them to integer to get proper numeric ordering.

For example like so:

from pathlib import Path

folder = Path('/home/pi/Desktop/timelapse')
highest = max(int(file.stem) for file in folder.glob('*.jpg'))

For more complicated file-name patterns this approach could be extended with regular expressions.

john-hen
  • 4,410
  • 2
  • 23
  • 40
0

Using re:

import re

filenames = [
    'file1.jpg',
    'file2.jpg',
    'file3.jpg',
    'file4.jpg',
    'fileA.jpg',
    ]

### We'll match on a general pattern of any character before a number + '.jpg'
### Then, we'll look for a file with that number in its name and return the result
### Note: We're grouping the number with parenthesis, so we have to extract that with each iteration.
### We also skip over non-matching results with teh conditional 'if'
### Since a list is returned, we can unpack that by calling index zero.
max_file = [file for file in filenames if max([re.match(r'.*(\d+)\.jpg', i).group(1) for i in filenames if re.match(r'.*(\d+)\.jpg', i)]) in file][0]

print(f'The file with the maximum number is: {max_file}')

Output:

The file with the maximum number is: file4.jpg

Note: This will work whether there are letters before the number in the filename or not, so you can name the files (pretty much) whatever you want.

*Second solution: Use the creation date. *

This is similar to the first, but we'll use the os module and iterate the directory, returning a file with the latest creation date:

import os

_dir = r'C:\...\...'

max_file = [x for x in os.listdir(_dir) if os.path.getctime(os.path.join(_dir, x)) == max([os.path.getctime(os.path.join(_dir, i)) for i in os.listdir(_dir)])]
Mark Moretto
  • 2,344
  • 2
  • 15
  • 21
0

You can use os.walk(), because it gives you the list of filenames it founds, and then append in another list every value you found after removing '.jpg' extension and casting the string to int, and then a simple call of max will do the work.

import os

# taken from https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
_, _, filenames = next(os.walk(os.getcwd()), (None, None, []))
values = []

for filename in filenames:
    try:
        values.append(int(filename.lower().replace('.jpg','')))
    except ValueError:
        pass  # not a file with format x.jpg

max_value = max(values)
crissal
  • 2,547
  • 7
  • 25