0

I have files that are to be copied and removed on different days and different times, and the criteria is part of the file's name. I was thinking of using Bash and regex to combine various variables in regular expressions, and simply use mv. But perhaps a loop of some kind where I parse the files, is a better idea.

Say I have a file called: *TuesdayThursdayMonday_1800-1900.txt*

Now let's say $dayofweek is Monday.

I want the criteria to be:

*$dayofweek* must exist before the _ The current time must be more than what's left of dash (1800) AND the current time must be less than what's right of the dash (1900).

If all this is true, do mv on the file.

anubhava
  • 761,203
  • 64
  • 569
  • 643
Paolo
  • 2,161
  • 5
  • 25
  • 32
  • Capture `([A-Za-z]+)_(\d+)-(\d+)` and use substring matching on the first group and integer comparison on groups 2 and 3 (after parsing). All of this is only one google away. – AlexR Jul 14 '14 at 17:56
  • @AlexR Do you recommend I group them using cut, or is there another better way of doing it? – Paolo Jul 14 '14 at 18:14
  • Have a look over [here](http://stackoverflow.com/questions/1891797/capturing-groups-from-a-grep-regex). – AlexR Jul 14 '14 at 18:19

1 Answers1

3
# Function checkfilename:
#   Usage: checkfilename filename dayofweek [time]
#     filename format: dayname..._timestart-timeend.extension
#     (Underscores can optionally appear between the daynames.)
#   Checks if filename contains dayofweek before the (last) underscore
#   and that time is within the time range after the (last) underscore.
#   If time is not given, the current time is used.
#   Code notes:
#     ${var#patt} Removes patt from beginning of $var.
#     ${var%patt} Removes patt from end of $var.
#     10#num interprets num as decimal even if it begins with a 0.

checkfilename() {
  local file day time days days2 times tstart tend

  file="$1"  # filename
  day="$2"   # day of week

  # Check if the first part of the filename contains day.
  days=${file%_*} # just the days
  days2=${days/$day/} # Remove day from the days.
  # If days == days2 then days didn't contain day; return failure.
  if [ "$days" == "$days2" ]; then return 1; fi

  # Get time from 3rd parameter or from date command
  if (($# >= 3)); then time=10#"$3"
  else time=10#$(date +%H%M); fi  # get time in HHMM format

  times=${file##*_}; times=${times%.*}   # just the times
  tstart=10#${times%-*}; tend=10#${times#*-}

  # If second time is less than first time, add 2400
  ((tend < tstart)) && ((tend+=2400))
  # If current time is less than first time, add 2400
  ((time < tstart)) && ((time+=2400))

  # Check if time is between tstart and tend; return result.
  ((tstart <= time && time <= tend))
  return $?
}

file="TuesdayThursdayMonday_2300-0018.txt"
dayofweek="Thursday"
checkfilename "$file" "$dayofweek" 0005 && echo yep

If the filename has a prefix to extract as well, it can be done like this:

file="1A_Monday_1800-1900.mp4"

ext=${file##*.}           # remove from front longest  string matching *.
file=${file%.*}           # remove from back  shortest string matching .*
prefix=${file%%_*}        # remove from back  longest  string matching _*
days=${file#*_}           # remove from front shortest string matching *_
days=${days%%_*}          # remove from back  longest  string matching _*
times=${file##*_}         # remove from front longest  string matching *_

echo $file
echo $ext
echo $prefix
echo $days
echo $times

Note that in the match patterns, '*' matches any number of any character. '.' matches an actual period and '_' matches an actual underscore. Others are '?', matching any single character, [abcd] matching any one of the contained characters, and [^abcd] (or [!abcd]), matching any character except one of the contained characters.

${var#patt} expands to $var with shortest patt match removed from front.
${var##patt} expands to $var with longest patt match removed from front.
${var%patt} expands to $var with shortest patt match removed from end.
${var%%patt} expands to $var with longest patt match removed from end.

A totally different method, using the IFS (input field separator) shell variable instead of the parameter expansions, splitting the fields on underscore and period into an array.

#!/bin/bash

# Function checkfilename:
#   Usage: checkfilename filename dayofweek [time]
#     filename format: prefix_dayname..._timestart-timeend.extension
#   Checks if filename contains dayofweek between the underscores
#   and that time is within the time range after the second underscore.
#   If time is not given, the current time is used.
#   Code notes:
#     10#num interprets num as decimal even if it begins with a 0.
#     'declare' also makes a variable 'local'
checkfilename() {
  local file="$1"  # filename
  local day="$2"   # day of week

  local IFS='_.'   # Split fields on underscore and period.

  # Split and extract times and days.
  local a=($file)         # Split filename into array.
  local prefix="${a[0]}"  # Set prefix to the first field
  local days="${a[1]}"    # Set days to second field.
  local times="${a[2]}"   # Set times to third field.
  local ext="${a[3]}"     # Set ext to last field.

#  echo -e "\nFile: $file"
#  echo -e "  Prefix: $prefix\n  Days: $days\n  Times: $times\n  Ext: $ext"

  # If days doesn't contains day, return failure.
  if [ "$days" == "${days/$day/}" ]; then return 1; fi

  # Get time from 3rd parameter or from date command
  declare -i time
  if (($# >= 3)); then time=10#"$3"
  else time=10#$(date +%H%M); fi  # Get time in HHMM 24-hr format.

  declare -i tstart=10#${times%-*} tend=10#${times#*-}

  ((tend < tstart)) && ((tend+=2400))
  ((time < tstart)) && ((time+=2400))

  # Check if time is between tstart and tend; return result.
  ((tstart <= time && time <= tend))
  return $?
}

file="1A_TuesdayThursdayMonday_2300-0018.txt"
dayofweek="Thursday"
checkfilename "$file" "$dayofweek" 0005 && echo pass1
checkfilename "$file" "$dayofweek" 0025 || echo pass2
dayofweek="Saturday"
checkfilename "$file" "$dayofweek" 0005 || echo pass3
ooga
  • 15,423
  • 2
  • 20
  • 21
  • I was going to write something very much like this. The only thing with this answer is that it doesn't assert that the day was before the `_`. – Etan Reisner Jul 14 '14 at 18:30
  • @EtanReisner Yes it does! Look again. – ooga Jul 14 '14 at 18:33
  • No, it doesn't. Try running it against `TuesdayThursday_1200-1500.txtMonday`. – Etan Reisner Jul 14 '14 at 18:37
  • @EtanReisner Oh, I see what you mean. Presumably not a likely filename in this context, but easy to fix. I'll edit the answer. – ooga Jul 14 '14 at 18:50
  • Yeah, I hadn't fully seen processed what you were doing to see that it effectively covered most possible cases but yeah explicit handling is easy enough to be worth it. – Etan Reisner Jul 14 '14 at 18:54
  • @ooga Thank you very much. I've now done something like this, and it works fine. But then I realized… What about crossing midnight? More than 23:00 and less than 01:00 isn't true. Of course, I could change it to "or" instead of "and", but then the "ands" wouldn't be true any more. Any suggestions? – Paolo Jul 14 '14 at 19:01
  • @Paolo Good point! See the edit above. Actually, there's another problem, too: if the hour starts with a 0 it's interpreted as octal! So I removed leading 0's. – ooga Jul 14 '14 at 19:18
  • @ooga Thank you very much for your help. I'm getting an error however, saying "./tester: line 31: ((: 0128: value too great for base (error token is "0128")" where line 31 is "if [ $h -lt $t1 ]; then ((h+=2400)); fi". It seems it was an error with the initial zeroes. I changed your zero removal lines to t1="$(echo $t1 | sed 's/0*//')" and now it seems to work. – Paolo Jul 14 '14 at 23:31
  • @Paolo The error means it's trying to interpret `0128` as octal (since it begins with a `0`), and the digit `8` is illegal in octal, but you probably know that. I don't see why the zero-removal lines wouldn't work. For instance, this works: `t1=0128; t1=${t1#0}; t1=${t1#0}; t1=${t1#0}; t1=${t1:-0}; echo $t1`. (Prints `128`.) Your version will also obviously work, the difference being that it's much less efficient since it forks a subshell and also runs another program. Still, that's not the end of the world. :-) – ooga Jul 14 '14 at 23:59
  • Hmmm… I changed back to your way of doing it, and now it works just as good. I'm not getting the error any more, not sure why I kept getting it. Thank you for your help. Hope you'll be here tomorrow, in case the error occurs again :) – Paolo Jul 15 '14 at 00:13
  • @Paolo Another edit to clean up the code and obviate the need to remove the leading zeroes. – ooga Jul 15 '14 at 14:39
  • @ooga Once again, thank you very much. It works flawlessly. Question: What is the parsing method called (${file%_*}), and where can I learn more about it? For example, if I add a prefix to the files (e.g 1A_Monday_1800-1900.mp4) and I want the function to ignore/cut off the prefix while analyzing the file, how would that be expressed? – Paolo Jul 15 '14 at 20:19
  • 1
    @Paolo They are types of "parameter expansion" and are described in the [GNU Bash manual](http://www.gnu.org/software/bash/manual/html_node/index.html) under the section on [Shell Parameter Expansion](http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html). See edit above for more info. – ooga Jul 16 '14 at 02:14