How to identify if file is still written or completed through linux script

Question

We have one system which is generating files and I want to check which files out of many files are completed and also not been modified from past two minutes through a script and finally rename those.

This is what I tried but the result is not correct. Could someone help?

for file in /home/test/*abc_YYYYMMDDhhmmss*
do
    f1=`basename $file`
    if [ lsof | grep "$f1" = "" ];then
        if  [ `stat --format=%Y $file` -le $(( `date +%s` - 300 )) ]; then
        mv "$f1" "${f1}_Complete"
    else
       echo "no files to collect"
    fi
done

What is `if [ lsof | grep "$f1" = "" ]`? Even if `[` worked like this, the casual reader would expect that the author's intent was to check that the output of grep was non-empty, but `grep "$f1" = ""` looks like a call to `grep` with `=` as a filename. You can always just check the value returned by grep. eg `if ! lsof | grep -q "$f1"; then` does what I *think* you're trying to accomplish here. — William Pursell, Jan 23 '21 at 13:13

tripleee · Accepted Answer · 2021-01-23T10:55:34.773

You are making the common mistake of assuming that [ is part of the if command's syntax; but it's not: [ is just another command. The syntax for an if statement is

if commands; then
    : what to do if the exit code from commands was 0
else
    : what to do if not
fi

where commands can be an arbitrarily complex sequence of commands, and the exit code from the last command in the sequence decides which branch to take; and the else branch is optional.

As a minimal fix, change to

    # use modern $(command substitution) syntax
    # instead of obsolescent `command substitution`;
    # always quote variables with file names
    f1=$(basename "$file")
    # Remove [ and switch to grep -q;
    # add -F to grep flags for literal matching
    if ! lsof | grep -Fq "$f1"; then

Anyway, what about something like this instead?

find $(lsof |
    awk 'NR==FNR { if ($9 ~ /^\/home\/test\//) a[$9]++; next }
    FNR == 1 {
        if (! (FILENAME in a)) print FILENAME;
        next }' - /home/test/*abc_YYYYMMDDhhmmss*) \
    -type f -mmin +2 -exec sh -c '
        for file; do
            mv "$file" "${file}_Complete"
        done' _ {} +

This is pretty complex, but here's a rundown.

lsof | awk ... prints out the files which are not open from the wildcard matches.
- This assumes that the files are regular text files - some Awk variants have trouble with binary input files. It would probably not be too hard to refactor this to avoid this constraint if it's proplematic.
- In some more detail, the first argument to Awk is - i.e. standard input, which reads the pipe from lsof. The condition NR==FNR is true for the first input file; we simply collect the open files into the associative array a. Then the second condition prints the name of the current input file if it's not in the array; this is executed for the remaining input files, i.e. those which match the wildcard.
This is passed as the paths for find to examine; it will look for any files modified in the last two minutes, and pass the result to the command in -exec.
The simple shell script in -exec should be easy to understand. find passes the found files as command-line arguments, but sh -c fills them from $0 so we pass in a dummy _ to push the file names into $1, $2 etc which is what for loops over if you don't give it a list of arguments.

This will probably not work if your file names contain newlines; then you'll need something more complex still.

Looping over arbitrary file names is disappointingly complex in Bourne-family shells, and finding elements not in a list is always slightly pesky in shell script. Ksh and Bash offer some relief because they have arrays, but this is not portable to POSIX sh / ash / dash.

hi thanks for the update i tried to incorporate the changes you mentioned and this what i could do with the code the issue is it takes lot of time to run. `for file in /home/test/*abc_YYYYMMDDhhmmss* do` `f1=`basename $file` `if lsof | grep "$f1" = "" ; then if `stat --format=%Y $file` -le $(( `date +%s` - 300 )) ; then mv "$f1" "${f1}_Complete" else echo "no files to collect" fi done ` — Rohit Shamdasani, Jan 24 '21 at 11:33
If I can guess where there is supposed to be punctuation in that, it looks like you reverted half of the important and useful changes. — tripleee, Jan 24 '21 at 13:42

Jay · Answer 2 · 2021-01-24T23:50:09.060

0

How about this way?

#!/bin/bash
find /home/test/* -type f -mmin +2 -print0 |
    while IFS= read -r -d '' line; do
            echo $line
            fuser -s "$line"
            mv "$line" "${line}_Completed"
    done

-mmin +2 means the files that have not been modified in the last 2 minutes.

Edit: as requested, I have changed it to check if the file is currently open. fuser -s "$line" line will exit if the file is currently opened, else it will proceed to move the file. I have also wrapped the variables in quotes, thanks for the heads up

edited Jan 24 '21 at 23:50

answered Jan 23 '21 at 07:18

Jay

161
1
18

Hi Jay thanks for the update but this method is only checking for one condition if file is not modified from last two mins i want to add one more condition to check if the file is currently being modified or open by some other user. Could you help to configure that – Rohit Shamdasani Jan 23 '21 at 09:26
1

[When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Jan 23 '21 at 10:56
@RohitShamdasani I have updated the code, please check – Jay Jan 24 '21 at 23:53

How to identify if file is still written or completed through linux script

2 Answers2