0

I just want to know if it is possible with the below snippet to also assign the count of files matched by find command into the total variable?

total=0
counter=1
while IFS= read -r -d '' file; do
    echo "process file $counter of $total"
done < <(find . -iname "*.txt" -type f -print0 | sort -zn)

NOTE: Is it an efficient approach to execute the find command above the loop, and then count the total as well as to use its result in the loop?

Sherzad
  • 405
  • 4
  • 14
  • I am able to count the output of the find command. However, my question is that is is possible with one find command to accomplish both feeding the find command results to the while as well as to count the files? – Sherzad Dec 06 '19 at 09:42
  • @oguzismail - you say _"just update the variables"_. Sure the `$counter` can be incremented by `+1` during each turn of the loop, however how can the `$total` variable be known in advance _without_ running an additional `find` in advance. This I believe is the crux of the question. – RobC Dec 06 '19 at 10:50
  • @RobC right that is the most important point at issue. – Sherzad Dec 06 '19 at 10:55
  • Oh now I got it, yeah – oguz ismail Dec 06 '19 at 10:58
  • Still the question is so vague that it seems impossible to post a correct answer – oguz ismail Dec 06 '19 at 10:59
  • @oguzismail the question is if it is possible with the same find command. – Sherzad Dec 06 '19 at 12:16
  • 1
    @AbdulRahmanSherzad - AFAIK you'll need to iterate the file list twice, whether via; **1)** Doing two find's, e.g. [script.sh](https://paste.ee/p/bFqXU) **2)** Or, doing one find using a `while` and another `for` loop, e.g. [script-2.sh](https://paste.ee/p/apeTw). You say "Is it an efficient approach" suggests you're concerned about performance. Running; `time ./script.sh` or `time ./script-2.sh` the delta in time taken to complete is negligible. What's for certain when using `find` twice, as per `script.sh`, it can have undesired results if files were added/deleted between the two executions. – RobC Dec 06 '19 at 16:16

1 Answers1

2

I assume you want to display the progress of the processing in the while loop file by file. However we cannot determine the value of $total until the while loop ends as long as we increment the value in the loop.

As an alternative, you can create an array of files at first, then iterate over the files having knowing the value of $total.

Would you try the following:

mapfile -d "" -t files < <(find . -iname "*.txt" -type f -print0 | sort -zn)
total="${#files[@]}"
for file in "${files[@]}"; do
    ((++counter))
    echo "process file $counter of $total"
    # do something with $file
done

Hope this helps.

tshiono
  • 21,248
  • 2
  • 14
  • 22
  • 1
    Presumably this requires Bash `>=4.4` as it utilizes the `-d` option with `mapfile` to create the array, as mentioned [here](https://stackoverflow.com/questions/23356779/how-can-i-store-the-find-command-results-as-an-array-in-bash#answer-54561526)? Is it correct to assume that if you're using Bash `<4.4` you'll need to revert to utilizing a `while` loop instead of `mapfile -d "" ...` to create the array and correctly handle file names containing newlines and spaces - as per [script-2.sh](https://paste.ee/p/apeTw) that I previously mentioned in the comments? – RobC Dec 07 '19 at 09:27
  • @tshiono I have read on another thread on Stack-overflow that **never use for loop with find command** because of command-line buffer and also the loop must wait until the find run to completion. I just want to know if your proposed solution is efficient? – Sherzad Dec 09 '19 at 10:43
  • @AbdulRahmanSherzad Unfortunately I could not find the thread. Would you please point me to it? IMHO I don't think combining `find` with `while` loop (as your posted example) is an antipattern. Regarding the efficiency, it may be difficult to say in a word because it depends on the size of data (number of files) and requirement, but I believe my proposal will be efficient enough for most cases. – tshiono Dec 09 '19 at 12:13
  • @tshiono thanks for your reply. I am dealing with 10000 files where filenames comprise of 40-50 characters length. Here is the link to the thread post [link](https://stackoverflow.com/a/9612560/2021982) – Sherzad Dec 09 '19 at 12:20
  • @AbdulRahmanSherzad Thank you for the quick reply. I've understood what the author may want to say. In this case you could say: `for file in *.txt; do ..` which will be most efficient. Going back to the efficiency issue, I'd recommend to test on your data by yourself. If you want to indicate the total count of the files in the loop, you need to wait for the 'find' command to complete anyway. – tshiono Dec 09 '19 at 12:38
  • 1
    @tshiono many thanks for your inputs. As **RobC** stated it seems -d is support for Bash >= 4.4 because in my case it displays this error **mapfile: -d: invalid option**. – Sherzad Dec 09 '19 at 14:47
  • @AbdulRahmanSherzad Thank you for the try. If `-d` option for `mapfile` is not supported, please try RobC's alternative found [here](https://paste.ee/p/apeTw). It is equivalent to my proposal in essence. It may take slightly longer time due to the `while` loop but not significant. I've generated 10,000 files with 50 char finename and measured the execution time of the array-generating-with-find block. My proposal took 0.1sec and RobC's one took 0.2sec. I suppose both will work efficiently. – tshiono Dec 09 '19 at 23:18