0

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.

If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.

I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:

string_check=000000
for file in ${file_list[@]}; do
  if [[ ${file:22:6} == $string_check ]]; then
    echo "Ok"
  else
    echo "Problem: an hour (file) is missing"
    exit 99
  fi
  string_check=$((string_check+3000)) #this is the key line
done

And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?

David
  • 1,155
  • 1
  • 13
  • 35
  • 1
    do all of these 6-digit strings always end in `xx3000` or `xx0000`? could you have strings like `xx1234` and `xx4813`? or do you ignore anything that doesn't end in `xx[03]000`? – markp-fuso Jan 07 '21 at 14:42
  • There should not be strings like `xx1234` and if they are, then an exit (error) should be the output. – David Jan 07 '21 at 15:15
  • 1
    answer updated with some `comm` commands that can generate lists of strings: **a)** missing sequence and **b)** invalid strings (eg, `xx1234`) – markp-fuso Jan 07 '21 at 15:46
  • 1
    Why not just count 48 files? – Léa Gris Jan 07 '21 at 16:22
  • @LéaGris: yes, it's an option, but I want to be sure the steps are the right ones (in jumps of 30minutes and so on). – David Jan 07 '21 at 16:28
  • 1
    @David `shopt -s nullglob; unset arr; arr=(*{00..23}{00,30}00*); if [ "${#arr[@]}" -eq 48 ]; then echo 'ok'; fi` – Léa Gris Jan 07 '21 at 16:32
  • is it possible for more than one file to have the same string (eg, `090000`)? if 'yes' then testing for `count = 48` (obviously) won't work – markp-fuso Jan 07 '21 at 18:43
  • @markp-fuso if one file wrongly match the pattern, then adjust the pattern so it does not match with it. The way of counting matches of a pattern is still valid, just choose the right pattern. – Léa Gris Jan 07 '21 at 20:17
  • @LéaGris not sure I understand what you're saying; assume 2 files named `a.090000.txt` and `b.090000.txt` ... `printf '%.0sx' ./*[0-2][0-9][03]000*` => `xx` (for a count of 2 `x's`) – markp-fuso Jan 07 '21 at 20:24
  • @markp-fuso poster did not mention multiple files of same time. But you are right that it will not work in this case. – Léa Gris Jan 07 '21 at 20:51

2 Answers2

1

Assumptions:

  • all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
  • if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)

Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:

$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000

While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:

unset      myarray
declare -A myarray

for file in ${file_list}
do
    myarray[${file:22:6}]+=" ${file}"       # in case multiple files have same 6-digit string
done

Using the sequence generator as the driver of our logic, we can pull this together like such:

for string_check in {00..23}{00,30}00
do
    [[ -z "${myarray[${string_check}]}" ]] &&
    echo "Problem: (file) '${string_check}' is missing"
done

NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).


One idea for using comm to compare the 2 lists of strings:

# display sequence generated strings that do not exist in the array:

comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[@]}" | sort)

# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence

comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[@]}" | sort)

Where:

  • comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
  • comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)

These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • I think that the code should be `for string_check in {00..23}{00,30}00` because the step is 30 minutes. – David Jan 07 '21 at 15:20
1

Doing this easy with POSIX shell and only using built-ins:

#!/usr/bin/env sh

# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)

# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
  echo "Ok"
else
  echo "Problem: an hour (file) is missing"
  exit 99
fi

Alternatively:

#!/usr/bin/env sh

if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
     = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
  echo "Ok"
else
  echo "Problem: an hour (file) is missing"
  exit 99
fi
Léa Gris
  • 17,497
  • 4
  • 32
  • 41