0

What I am trying to do is run a bash script that looks somewhat like this:

#!/usr/bin/bash

only1=$(comm -23 $1 $2 | wc -l)
only2=$(comm -13 $1 $2 | wc -l)
common=$(comm -12 $1 $2 | wc -l)

echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"

If I execute the script as script.sh file1 file2 it works fine. However, if I use it as script.sh <(grep 'foo' file1) <(grep 'foo' file2) it fails because the virtual files of the kind dev/fd/62 are only available for the first command (only1 in the script). The output is:

262 only in /dev/fd/63
0 only in /dev/fd/62
0 in both

Is there a way to make these virtual files available to all of the commands in the script?

jww
  • 97,681
  • 90
  • 411
  • 885
vkkodali
  • 630
  • 7
  • 18
  • You should supply some test data. It is not clear if `262 only in /dev/fd/63` or `0 only in /dev/fd/62` is expected or not. – jww Oct 25 '19 at 18:28

1 Answers1

2

The issue here is that the first invocation of comm will read to the end of both input files.

As you'd like to be able to provide pipes as the input (instead of a "real file), you'll need read the inputs once only, and then provide that as input to the subsequent commands... With pipes, as soon as data is read, it's gone and isn't coming back.

For example:

#!/bin/bash -eu

# cleanup temporary files on exit
trap 'rm ${TMP_FILE1:-} ${TMP_FILE2:-}' EXIT

TMP_FILE1=$(mktemp)
cat < $1 > $TMP_FILE1

TMP_FILE2=$(mktemp)
cat < $2 > $TMP_FILE2

only1=$(comm -23 $TMP_FILE1 $TMP_FILE2 | wc -l)
only2=$(comm -13 $TMP_FILE1 $TMP_FILE2 | wc -l)
common=$(comm -12 $TMP_FILE1 $TMP_FILE2 | wc -l)

echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"

If your files are small enough, then you can get away with reading them into variables:

#!/bin/bash -eu

FILE1=$( < $1 )
FILE2=$( < $2 )

only1=$(comm -23 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
only2=$(comm -13 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
common=$(comm -12 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)

echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"

Please also note that comm only works on sorted data... which means you probably want to use sort on the inputs, unless you are fully aware of the consequences of using unsorted inputs.

sort < $1 > $TMP_FILE1
FILE1=$( sort < $1 )
Attie
  • 6,690
  • 2
  • 24
  • 34
  • I would just store the content in variables, not files: `dataA=$(< "$1" )` and then use process substitutions for the comm commands: `only1=$( comm -23 <(echo "$dataA") <(echo "$dataB") | wc -l )` – glenn jackman Oct 25 '19 at 16:58
  • In case the script is terminated early, it's a good idea to clean up the temp files in a trap: near the beginning of the code add `cleanup() { rm -f "$TMP_FILE1" "$TMP_FILE2"; }` and then `trap cleanup EXIT` – glenn jackman Oct 25 '19 at 16:59
  • 1
    Avoid using ALLCAPS variable names: you can [shoot yourself in the foot](https://stackoverflow.com/q/28310594/7552) – glenn jackman Oct 25 '19 at 17:20