2

EDIT - Reproducible Code, Output, and Updated Question

#!/bin/bash

# Input Directory
inpd="/home/space user/space test/space dir1"

# Line To Parse
line="/home/space user/space test/space dir1: space file 1.txt"

# Split Line awk -F[:]
echo ""
var1=$(echo "$line" | awk -F[:] '{print $1}')
echo "  echo var1"
echo "  $var1"
echo ""
echo "  printf var1"
printf "%-2s%s" "" $var1

# var1 == inpd
echo ""
echo ""
echo "  var1 == inpd"
if [ var1 == inpd ]; then
  printf "  Match."
else
  printf "  No match."
fi
echo ""

$ scriptname

  echo var1
  /home/space user/space test/space dir1

  printf var1
  /home/spaceuser/spacetest/spacedir1

  var1 == inpd
  No match.

Updated Question - How to define, cast, or properly compare var1 to inpd so it produces a match when the input has spaces? If there is a better way to find the match without calling awk it would also solve my problem.

I found the clue to solve my question here:

How can I remove all text after a character in bash?

$ script - this gives a Match!

#!/bin/bash

# Input Directory
inpd="/home/space user/space test/space dir1"

# Line To Parse
line="/home/space user/space test/space dir1: space file 1.txt"

# var1 keeps everything in 'line' before :
var1=${line%:*}
echo ""
echo "$line"
echo "$var1"
printf "$var1"

# "$var1" == "$inpd"
echo ""
if [ "$var1" == "$inpd" ]; then
  printf "  Match."
else
  printf "  No match."
fi
echo ""

EDIT - Why the Long Post?

I made a long post to show my script development effort but the question now reduces to an effort to match any /path with/ or without spaces/dir1 to the same path string or variable extracted from the output lines of the diff command. I am using awk with -F[:] as the separator but there may be an alternative way to do it. I tried to embed some reproducible code above and below with the description Reproducible Code. The updated question should be based on the above edit, and the long post is to preserve the new context and the original post.

For my use cases the custom script is non-recursive; it would handle spaces in the path or filenames; but as of now it would generate errors for any path or filename containing a colon : character and also for any filename containing a slash /. I am not sure what other characters or sequences would produce an error and I don't need a more robust script for my present purposes.

Spaces in any input path it must be contained in quotes dirt "/path with spaces/dir1".

So far I think if subdirectories appear in only one directory, as shown in my test directory structure, then in the absence of file extensions there is no way to determine whether the name refers to a file or subdirectory. I intend to use tree to list directories with color to show files and subdirectories and also use the new script dirt to compare files that are the same or different. This will probably work best for directories with few files and not many subdirectories which is my intended use case.

EDIT - Desired Output Format (Script Name dirt Using Test Directories Below)

$ dirt "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

BOTH    /home/joe/test dirdiff/dir1               /home/joe/test dirdiff/dir2
diff    diff.txt                                  diff.txt
        diffout.txt
        only1.txt
                                                  only2.txt
same    same space.txt                            same space.txt
same    same.txt                                  same.txt
        space 1.txt
                                                  space 2.txt
        subdir1
                                                  subdir2
comd    subdirC                                   subdirC

EDIT - Directory Structure With Spaces (Without :) To Test Script

/home/joe/test dirdiff
├── dir1
│   ├── diff.txt
│   ├── diffout.txt
│   ├── only1.txt
│   ├── same space.txt
│   ├── same.txt
│   ├── space 1.txt
│   ├── subdir1
│   └── subdirC
└── dir2
    ├── diff.txt
    ├── only2.txt
    ├── same space.txt
    ├── same.txt
    ├── space 2.txt
    ├── subdir2
    └── subdirC

EDIT - Output from running diff

$ diff -qs "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC

EDIT - Script Fragment dirt00 Stores diff Output in $diffout

  #!/bin/bash
  if [[ -z "$1" || -z "$2" ]]; then
    printf "\n  Type $ dirt00 Dir1 Dir2\n"
  else
    input1="$1"
    input2="$2"
    diffout=$(diff -qs "$1" "$2")
    # Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
    # variable that doesn't end with a newline then the while loop will
    # completely miss the last line of the variable.
    while IFS= read -r line
      do
        echo $line
      done < <(printf '%s\n' "$diffout")
  fi

EDIT - Output from running dirt00

$ dirt00 "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC

EDIT - Reproducible Code Script dirt01

#!/bin/bash
input1="/home/joe/test dirdiff/dir1"
input2="/home/joe/test dirdiff/dir2"
diffout="Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC"
# Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
# variable that doesn't end with a newline then the while loop will
# completely miss the last line of the variable.
printf "\n  %-8s%-40s%-40s\n" "BOTH" "$input1" "$input2"
while IFS= read -r line
  do
    #echo $line
    firstword=$(echo "$line" | awk '{print $1}')
    finalword=$(echo "$line" | awk '{print $NF}')
    if   [ $finalword == "differ" ]; then
      snip=${line%" differ"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","diff",$NF,$NF}'
    elif [ $finalword == "identical" ]; then
      snip=${line%" are identical"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","same",$NF,$NF}'
    elif [ $firstword == "Common" ]; then
      echo "$line" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","comd",$NF,$NF}'
    else
      echo ""
    fi
  done < <(printf '%s\n' "$diffout")

EDIT - Output from running dirt01

$ dirt01

  BOTH    /home/joe/test dirdiff/dir1             /home/joe/test dirdiff/dir2
  diff    diff.txt                                diff.txt



  same    same space.txt                          same space.txt
  same    same.txt                                same.txt




  comd    subdirC                                 subdirC

I cannot write dirt02, to complete the script, without an answer to the updated question at the top of this post.

I left the original question and post below to preserve the context for the existing answer and comments which are greatly appreciated!

NOTE - Original Question and Post Below

In the two lines starting $NF=="differ" and $NF=="identicial":

(1) How do I split the file name and extension from the directory using either identical awk variable shown below as $2 or $4 and then output the filename.ext in the printf command?

dirdiff - bash script

  #!/bin/bash
  if [[ -z $1 || -z $2 ]]; then
    printf "\n  Type $ dirdiff Dir1 Dir2\n"
  else
    LEFT=$1
    LEFT+=:
    RGHT=$2
    RGHT+=:
    printf "\n  %-8s%-40s%-40s\n" "" "$1" "$2"
    printf "  %-8s%-40s%-40s\n\n" "" "$LEFT" "$RGHT"
    diff -qs $1 $2
    echo ""
    printf "\n%-8s%-40s%-40s\n" "INFO" "$1" "$2"
    diff -qs $1 $2 | awk -v L=$LEFT -v R=$RGHT \
                     '$NF=="differ" {printf "%-8s%-40s%-40s\n","diff", $2, $4} \
                      $NF=="identical" {printf "%-8s%-40s%-40s\n","same", $2, $4} \
                      $3==L {printf "%-8s%-40s\n","", $4} \
                      $3==R {printf "%-8s%-40s%-40s\n","", "", $4}'
  fi

This is the debug and develop script which runs command $ diff -qs $1 $2 twice. The first time shows the raw output. The second time pipes output to awk where I am trying to parse lines and format output on the command line. My questions relate to the final five lines in the script. EDIT: I solved the printf syntax problem in awk as shown in the code.

Run dirdiff on command line gives the following command line output

$ dirdiff /usr/local/adm/sys /mnt/ssdroot/home/joe/admin/sys

          /usr/local/adm/sys                      /mnt/ssdroot/home/joe/admin/sys
          /usr/local/adm/sys:                     /mnt/ssdroot/home/joe/admin/sys:

Only in /mnt/ssdroot/home/joe/admin/sys: bashrc.txt
Only in /usr/local/adm/sys: debpkgs.txt
Files /usr/local/adm/sys/direnv.txt and /mnt/ssdroot/home/joe/admin/sys/direnv.txt differ
Only in /usr/local/adm/sys: dpiDec2022.txt
Only in /mnt/ssdroot/home/joe/admin/sys: mypkgs.txt
Only in /mnt/ssdroot/home/joe/admin/sys: pyenv.txt
Files /usr/local/adm/sys/ssh.txt and /mnt/ssdroot/home/joe/admin/sys/ssh.txt are identical
Files /usr/local/adm/sys/usbquirks.txt and /mnt/ssdroot/home/joe/admin/sys/usbquirks.txt differ


INFO    /usr/local/adm/sys                      /mnt/ssdroot/home/joe/admin/sys
                                                bashrc.txt
        debpkgs.txt
diff    /usr/local/adm/sys/direnv.txt           /mnt/ssdroot/home/joe/admin/sys/direnv.txt
        dpiDec2022.txt
                                                mypkgs.txt
                                                pyenv.txt
same    /usr/local/adm/sys/ssh.txt              /mnt/ssdroot/home/joe/admin/sys/ssh.txt
diff    /usr/local/adm/sys/usbquirks.txt        /mnt/ssdroot/home/joe/admin/sys/usbquirks.txt

Desired Command Line Output Format (Duplicated at Top)

$ dirdiff /usr/local/adm/sys /mnt/ssdroot/home/joe/admin/sys

INFO    /usr/local/adm/sys                        /mnt/ssdroot/home/joe/admin/sys
                                                  bashrc.txt
        debpkgs.txt
diff    direnv.txt                                direnv.txt
        dpiDec2022.txt
                                                  mypkgs.txt
                                                  pyenv.txt
same    ssh.txt                                   ssh.txt
diff    usbquirks.txt                             usbquirks.txt
SystemTheory
  • 339
  • 3
  • 15
  • You've shown your current actual output and your expected output but you haven't shown the input that would produce that output so we don't have anything to copy/paste to test with and so it's harder for us to help you than it should be. [edit] your question to include a [mcve] with concise, testable sample input and the expected output from that input so we can help you. – Ed Morton Dec 29 '22 at 14:18
  • Also [edit] your question to explain how sub-directories under the 2 directories passed as arguments should be treated - should your script recurse down into them or ignore them or something else? – Ed Morton Dec 29 '22 at 14:24
  • Thank you for the comments which are useful to understand the limitations of the script. I need to test against directories with whitespace in the path and basename; and with subdirectories in common and that differ. Note I am running ```diff -qs $1 $2``` on live directories, then the text data output is input to the ```| awk``` pipe along with the two directory names. I am not sure, at this point, how to write a script that takes in three text variables (two directories and simulated output from ```diff -qs $1 $2```). I hope to edit the question soon to provide robust reproducible example. – SystemTheory Dec 29 '22 at 17:22
  • I'm not sure that `diff -qs $1 $2` is a good starting point - that can't work if any of your file or directory names contain newlines, for example. – Ed Morton Dec 29 '22 at 17:56

2 Answers2

3

Hope this helps. I think the sub function is what you are asking about for the basename function.

Good luck!

    diff -qs $1 $2 | gawk -v L=$1 -v R=$2 \
      'BEGIN { printf "\n%-8s%-40s%-40s\n", "INFO", L, R } \
         $NF=="differ" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "diff", $4, $4 } \
         $NF=="identical" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "same", $4, $4 } \
         $3==L":" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "only", $4, "" } \
         $3==R":" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "only", "", $4 } '
INFO    dir1                                    dir2                                    
only                                            bashrc.txt                              
only    debpkgs.txt                                                                     
diff    direnv.txt                              direnv.txt                              
only    dpiDec2022.txt                                                                  
only                                            mypkgs.txt                              
only                                            pyenv.txt                               
same    ssh.txt                                 ssh.txt                                 
diff    usbquirks.txt                           usbquirks.txt 
atl
  • 575
  • 3
  • 6
  • Thank you. This answer works, it solves my problem, but only after I removed quotes from the second "$4" in the respective ```printf``` statements for "differ" and "identical" match. So instead of ```$4, "$4"``` it reads ```$4, $4```. – SystemTheory Dec 29 '22 at 02:38
  • thanks for the comment, updated the code to fix those aspects – atl Dec 29 '22 at 03:54
  • That would fail if any file names contained any whitespace. – Ed Morton Dec 29 '22 at 14:55
  • agree, there are other filenames that cause problems too, [a good discussion of `split` versus `sub`](https://unix.stackexchange.com/questions/134212/extract-file-name-from-path-in-awk-program) , some people suggest using `system` and invoking `basename` too. – atl Dec 29 '22 at 17:35
  • Best I can tell white space would be the only issue. Calling `basename` from `system()` would be wrong. [The answer](https://unix.stackexchange.com/a/490569/133219) suggesting that has multiple bugs and would be massively inefficient, do not do that. – Ed Morton Dec 29 '22 at 18:00
1

Directory Structure With Spaces (Without :) To Test Script

/home/joe/test dirdiff
├── dir1
│   ├── diff.txt
│   ├── diffout.txt
│   ├── only1.txt
│   ├── same space.txt
│   ├── same.txt
│   ├── space 1.txt
│   ├── subdir1
│   └── subdirC
└── dir2
    ├── diff.txt
    ├── only2.txt
    ├── same space.txt
    ├── same.txt
    ├── space 2.txt
    ├── subdir2
    └── subdirC

Reproducible Script Works for Paths & Names Containing Spaces but Not Colons

#!/bin/bash
input1="/home/joe/test dirdiff/dir1"
input2="/home/joe/test dirdiff/dir2"
diffout="Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC"
printf "\n  %-8s%-40s%-40s\n" "BOTH" "$input1" "$input2"
# Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
# variable that doesn't end with a newline then the while loop will
# completely miss the last line of the variable.
while IFS= read -r line
  do
    #echo $line
    firstword=$(echo "$line" | awk '{print $1}')
    finalword=$(echo "$line" | awk '{print $NF}')
    if   [[ "$finalword" == "differ" ]]; then
      snip=${line%" differ"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","diff",$NF,$NF}'
    elif [[ "$finalword" == "identical" ]]; then
      snip=${line%" are identical"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","same",$NF,$NF}'
    elif [[ "$firstword" == "Common" ]]; then
      echo "$line" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","comd",$NF,$NF}'
    elif [[ "$firstword" == "Only" ]]; then
      snip=${line#"Only in "}
      mdir=${snip%:*}
      name=${snip#*:}
      name=${name# *}
      if [[ "$mdir" == "$input1" ]]; then
        printf "  %-8s%-40s\n" "" "$name"
      else
        printf "  %-8s%-40s%-40s\n" "" "" "$name"
      fi
    else
      echo ""
    fi
  done < <(printf '%s\n' "$diffout")

$ scriptname

  BOTH    /home/joe/test dirdiff/dir1             /home/joe/test dirdiff/dir2
  diff    diff.txt                                diff.txt
          diffout.txt
          only1.txt
                                                  only2.txt
  same    same space.txt                          same space.txt
  same    same.txt                                same.txt
          space 1.txt
                                                  space 2.txt
          subdir1
                                                  subdir2
  comd    subdirC                                 subdirC
SystemTheory
  • 339
  • 3
  • 15