BASH process files as they are dumped in a directory

Question

On a Linux server, data Files will be dumped continuously in a directory after intermittent intervals say of 5 or 10 or even 15 minutes. I want to preprocess/cleanse these files one by one and SCP to some other server.

How should I process all these files recursively?

Should I write a single bash script, which will run continuously and process files recursively in that directory? Or should I schedule a script to run after each 10 minutes?

For a single continuously running script what should be the loop condition? or an infinite while loop?

...by the way, you probably want to trigger on `IN_CLOSE_WRITE`, to ensure that files are actually complete before you run your script; attempting to process a partially-written file tends to be a Bad Thing. — Charles Duffy, Dec 29 '15 at 23:06
@CharlesDuffy: Thanks a lot... did not know such a tool existed... :-) I need to remember this. — anishsane, Dec 30 '15 at 03:35

score 0 · Answer 1 · answered Sep 10 '14 at 10:12

0

I'd go for a scheduled script with cron, as infinite loops are, sort-of, bugs.

For the processing part, I'm not sure this is what you asked for but you can do something like this:

#!/bin/bash
FILES=/your/dir/*
for file in $FILES
do
  echo "I'm doing something with $file"
done

answered Sep 10 '14 at 10:12

ToX 82

1,064
12
35

An infinite loop is not a bug. A scheduled script risks sitting idle when new files arrive before the next scheduled run. – chepner Sep 10 '14 at 12:01
BTW, if you want to store a list of filenames in a variable, you need to use an array. As it is, `$FILES` is storing only the glob expression itself, not storing any actual filenames; `files=( /your/dir/* )` would be storing actual names, after which point one could iterate over those names with `for file in "${files[@]}"`. – Charles Duffy Dec 29 '15 at 23:01
1

Also, using all-caps names for your own variables is bad form. See fourth paragraph of http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html for POSIX conventions on environment variable names, keeping in mind that environment variables and shell variables share a namespace (so a poorly-named shell variable can unintentionally override an environment variable -- not just for the current process, but all subprocesses as well). – Charles Duffy Dec 29 '15 at 23:02

BASH process files as they are dumped in a directory

1 Answers1