0

I run the following bash command to format a log with a file inode and hash:

time find BASE_DIR -maxdepth 1 -mindepth 1 -type d |
sort |
xargs -P100 -n1 -IFF bash -ce "find FF -type f" |
sort |
xargs -n1 -I {} bash -ce "
    FILE=$1; INODE=`stat -c '%i' $FILE`;
    HASH=`cat $FILE | md5sum | cut -d' ' -f1`;
    printf 'Name: %s - Inode: 0x%X - MD5: %s\n' $FILE $INODE $HASH;" {}

But every time I run this I get something like this:

Name: FILE1 - Inode: 0xFFFFFFFFFFFFFFFF> - MD5: SOME_MD5
Name: FILE1 - Inode: 0xFFFFFFFFFFFFFFFF> - MD5: SOME_MD5
Name: FILE1 - Inode: 0xFFFFFFFFFFFFFFFF> - MD5: SOME_MD5
Name: FILE1 - Inode: 0xFFFFFFFFFFFFFFFF> - MD5: SOME_MD5

The same file every time. How do I correctly pass the args to bash?

EDIT
I was able to solve the problem by changing the second xargs to:

xargs -n1 bash -ce '
    path="$0";
    inode=`stat -c "%i" $path`;
    hash=`cat $path | md5sum | cut -d" " -f1`;
    printf "Name: %s - InodeContext<0x%X> - MD5: %s\n" $path $inode $hash;'
CforLinux
  • 267
  • 2
  • 14

2 Answers2

3

The immediate error is that the first argument passed to bash -c "...commands..." ends up in $0, not in $1. Also, because you used double quotes around "...commands..." the calling shell will interpolate all the variables at the time the script gets passed to the subshell.

This seems really convoluted, though. Without knowledge of what exactly you hope to accomplish, this is quite speculative, but I would approach it something like

time find BASE_DIR -maxdepth 1 -mindepth 1 -type d \
    -execdir find . -type f -exec bash -c '
        for f; do
            inode=$(stat -c "%i" "$f")
            md5=$(md5 <"$f" | cut -d " " -f1)
            printf "Name: %s - Inode: 0x%X - MD5: %s\n" "$f" "$inode" "$md5"
        done' _ {} \\+ \;

If you can explain how you want the output to be sorted, maybe put the sort key in the printf and sort by that at the end.

Notice also how we avoid uppercase in private variables.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • your solution is what I wanted to do in the beginning but I have more than 5 million files, hence why I tried to run this with xargs (at the end I wanted to add -P N to the seconds xargs) – CforLinux Mar 04 '19 at 13:15
  • Parallelizing jobs which are I/O bound will generally just congest your disk. – tripleee Mar 04 '19 at 13:17
0

I find it easier when doing one thing at a time and doing it good.

The -I{} option implicates -n1. I still am used to using -i option which is the same as -I{}.

find BASE_DIR -maxdepth 1 -mindepth 1 -type d |
# what's the point in sorting before -P100?
xargs -P100 -i find {} -type f |
sort |
# run stat and md5sum for the same file
# output: <filename> <stat output> <md5sum>
xargs -n1 bash -ce '
      printf "%s\n" "$1"; 
      stat -c "%i" "$1"; 
      md5sum "$1" | cut -d" " -f1;
' -- |
# for every three (filename, stat, md5sum) arguments run printf
xargs -n3 printf 'Name: %s - Inode: 0x%X - MD5: %s\n'
KamilCuk
  • 120,984
  • 8
  • 59
  • 111