How do I compare similar file names with timestamps in the names to see which is the newest in Bash?

Question

I am new to Bash scripting and I'm open to constructive criticism and want to learn. I'm working on a bash script to automate moving Backup files to an archive server. The plan is to run the script monthly to copy backups from storage server 1 to storage server 2. These backups are generated on the first Sunday of every month, and I need to copy the full backups from storage server 1 to storage server 2 on the following Monday. I want to loop through my directory of backup files and see which ones are the most recently created and match the file extension that I'm looking for. The .vbi extension is an incremental backup and I don't care about copying these, I only want to copy the most recently created .vbk files which are full backup files. There should only every be 2 files in the parent directory with names that match other than the timestamp and random 4 digit section. The last 5 characters before the file extension don't matter for my purposes (Im not really sure what they represent), and the last 22 characters in the filename before the .vbk will be the section that is different in each file. To clarify this, the filename is ('server name' - 'Server IP' D yyyy-mm-dd T hhmmss _ xxxx) I want to compare ('server name' - 'Server IP' D yyyy-mm-dd T hhmmss) against the time section (D yyyy-mm-dd T hhmmss) of the matching ('server name' - 'Server IP') I have most of this figured out, but I'm struggling with this one piece. This is an example of what the directory looks like

-rw-r--r-- 1 root root    0 Jul  1 10:20 'Webserver - 10.10.0.60D2023-07-01T003026_u153.vbk'
-rw-r--r-- 1 root root    0 Jul  8 08:32 'WebServer - 10.10.0.60D2023-07-08T002832_g842.vbk'
-rw-r--r-- 1 root root    0 Jul  8 07:23 'WebServer - 10.10.0.60D2023-07-08T023216_f264.vbi'
-rw-r--r-- 1 root root    0 Jul  1 10:10 'SQLServer - 10.10.0.4D2023-07-01T021049_8fj3.vbk'
-rw-r--r-- 1 root root    0 Jul  8 05:20 'SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk'
-rw-r--r-- 1 root root    0 Jul  8 11:04 'SQLServer - 10.10.0.4D2023-07-08T042046_9ju7.vbi'

I want to grab the files on line 2 and line 5 because they are the most recently created backups, and have the .vbk extension.

I can get a list of just the .vbk files already by running this.

for i in *.vbk;
do
     [ -f "$i" ] || break
          echo "$i"
done

and I get this list

'Webserver - 10.10.0.60D2023-07-01T003026_u153.vbk'
'WebServer - 10.10.0.60D2023-07-08T002832_g842.vbk'
'SQLServer - 10.10.0.4D2023-07-01T021049_8fj3.vbk'
'SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk'

how can I loop through this list and create a list of only the 2 newest backups where the _xxxx at the end of the name appears to be random? In this example I want to grab lines 2 and 4. I can compare the timestamps in the file name, or I can compare the system file times, I believe either will work.

define `newest`; will there always be exactly 2 matching files? what if 3+ files have the same `newest` datetime stamp? are we looking for all files `within the last X days`? what if `newest` is, say, 4 weeks 'old'? please update the question with the additional details — markp-fuso, Jul 09 '23 at 23:07
People hate parsing `ls` output (and with your filenames including spaces and `-` they are right), but you could still do something like `/bin/ls -t *.vbk | tail -2` will sort by the file date/time and present the last `2` (any number you care to use). Good luck. — shellter, Jul 10 '23 at 00:12
So we have to group these files according to the host which they are for. In every group of files for a given host, there is a newest file according to its date. — Kaz, Jul 10 '23 at 05:48
_newest_ usually means "most recent modification date". If you want to derive instead the _newness_ from the file name, specify how your idea of the "age" of the file is encoded in its name. I don't see anything in the filename which intuitively would look like a timestamp to me. — user1934428, Jul 10 '23 at 05:49
@user1934428 look at `2023-07-01T003026`, its YYYY-MM-DDThhmmss — Nic3500, Jul 10 '23 at 12:39
@markp-fuso The first Paragraph has been updated to clarify details. There should only be files from the last 38 days at most. There should never be more than 1 file with the same time stamp. As for what files within X days, this can be handled 2 different ways. Each server has its own directory that stores the nightly incremental backups as well as the Full back up from the first Sunday of the most recent 2 months and may have around 38 files, 2 of which are the desired .vbk files. The second option is to grab the 2 .vbk files from each servers directory and compare just those .vbk. — Jake, Jul 10 '23 at 16:46
@Nic3500 : What is the meaning of the _T_ inside the timestamp string? Something related to the time zone? — user1934428, Jul 11 '23 at 05:46
Look at https://stackoverflow.com/questions/6340794/yyyy-mm-ddthhmmss-what-is-the-meaning-of-t-here, it is just to let you know that what follows is the time portion of the string. — Nic3500, Jul 11 '23 at 15:13

score 1 · Answer 1 · answered Jul 10 '23 at 00:21

This command will extract the date - time of your to latest files:

find data -type f -name "*.vbk" -print | sed 's/.*D\(.*\)_.*/\1/' | sort -n | tail -2

I assume all files in a directory called "data".
find ...: lists all files named *.vbk
sed ...: extract the portion between the D and the _. This is where you have your data and time information.
sort: sort numerically. You are lucky, the file naming convention used by whatever produces theses files has the date and time properly ordered for a simple sort to work.
tail: keep only the last 2 lines

The result of this command is the following:

2023-07-08T002832
2023-07-08T012046

You can then use a while loop to list files:

#!/bin/bash

while IFS= read -r datetime
do
    /bin/ls data/*${datetime}*
done < <( find data -type f -name "*.vbk" -print | sed 's/.*D\(.*\)_.*/\1/' | sort -n | tail -2 )

This worked perfectly for displaying the results. I modified it slightly so that I could copy the output from tail and paste it into a text file. This is what I ended up `/bin/ls /data/*${datetime}* | tee -a /home/jake/VBKList` to send the text to a text file. Next, I replaced the double quotes with single quotes around *.vbk because I was getting path errors and removed -print because the tee command above took care of printing to the screen and adding the text to my text file. `done < <( find /data/ -type f -name '*.vbk' | sed 's/.*D$.*$_.*/\1/' | sort -n | tail -1)` — Jake, Jul 14 '23 at 15:22

Jay jargot · Answer 2 · 2023-07-11T11:47:37.780

The ascii order, considering the date and time format, is the same than date and time order. sort can be used to start sorting after the D char.

Finally, you could iterate over the ordered filenames and use an associative map to only keep the latest backups.

The solution below could break if there are fancy chars in filenames:

unset latest_server_backup
declare -A latest_server_backup
while IFS= read -r filename ; do
  server=${filename%% *}
  server=${server^^}
  latest_server_backup[$server]=${filename}
done < <(find . -type f -name \*.vbk 2>/dev/null | sed 's%^./%%' | sort -tD -k2)
for server in "${!latest_server_backup[@]}" ; do
  printf "%s\n" "${latest_server_backup[$server]}"
done

Output:

SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk
WebServer - 10.100.0.60D2023-07-08T002832_g842.vbk

How do I compare similar file names with timestamps in the names to see which is the newest in Bash?

2 Answers2