1

Goal: Regex pattern for use with find and locate that "Contains A but not B"

So I have a bash script that manipulates a few video files.
In its current form, I create a variable to act on later with a for loop that works well:

if [ "$USE_FIND" = true ]; then
    vid_files=$(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
else
    vid_files=$(locate -ir "${DIR}.*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
fi

So "contains A" is any one of the listed extensions.

I'd like to add to a condition where if a certain string (B) is contained the file isn't added to the array (can be a directory or a filename).

I've spent some time with lookaheads trying to implement this to no avail. So an example of "not contains B" as "Robot" - I've used different forms of .*(?!Robot).*

e.g. ".*\(\?\!Robot\).*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" for find but it doesn't work.

I've sort of exhausting regex101.com, terminal and chmod +x at this point and would welcome some help. I think it's the case that's it's called through a bash script causing me the difficulty.

One of my many sources of reference in trying to sort this:
Ref: Is there a regex to match a string that contains A but does not contain B

Community
  • 1
  • 1
Mark
  • 610
  • 9
  • 22

1 Answers1

0

You may want to avoid the use find inside a process substitution to build a list of files, as, while this is admittedly rare, filenames could contain newlines.

You could use an array, which will handle file names without issues (assuming the array is later expanded properly).

declare -a vid_files=()
while IFS= read -r -d '' file 
do
  ! [[ "$file" =~ Robot ]] || continue
  vid_files+=("$file")
done < <(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" -print0)

The -print0 option of find generates a null byte to separate the file names, and the -d '' option of read allows a null byte to be used as a record separator (both obviously go together).

You can get the list of files using "${vid_files[@]}" (double quotes are important to prevent word splitting). You can also iterate over the list easily :

for file in "${vid_files[@]}"
do
  echo "$file"
done
Fred
  • 6,590
  • 9
  • 20
  • Hi @Fred - I appreciate your approach. I also agree with your identification of the possible newline character but I do know that's not in play. – Mark Apr 24 '17 at 20:45
  • Code gets copy and pasted all the time, and corner cases are irrelevant until, one day, they are. This being said, you can use `grep -v` as suggested in the comments if you do not mind using a newline-separated list. – Fred Apr 24 '17 at 20:57