Find out what values of `array1` are missing in `array2`

Question

declare -a array1=( 1 2 3 4 5 6 7 8 9 10 11 12 )
declare -a array2=( 1 2 3 5 6 7 9 10 11 12 )

In bash, how can I get a third array of the values that are present in array1 but absent in array2? In the above example, the expected output is ( 4 8 )

@andlrc. Thanks. I don't quite understand what a command substitution is (but will investigate the question). I have changed the first to correctly define an array. — Remi.b, Jun 08 '16 at 22:59
@Remi.b Any reason you are doing this? This isn't really what shell scripting is for. Of cause it's possible, but do you have a specific goal in mind? — Andreas Louv, Jun 08 '16 at 23:08

Benjamin W. · Accepted Answer · 2016-06-09T03:28:09.900

With mapfile, comm, sort and process substitution:

array1=( 1 2 3 4 5 6 7 8 9 10 11 12 )
array2=( 1 2 3 5 6 7 9 10 11 12 )
mapfile -t arr < <(comm -23 --nocheck-order \
                       <(printf "%s\n" "${array1[@]}" | sort -n) \
                       <(printf "%s\n" "${array2[@]}" | sort -n))

Result:

$ declare -p arr
declare -a arr='([0]="4" [1]="8")'

Explanation, from the inside out:

Print the array one element per line and sort:

$ printf "%s\n" "${array1[@]}" | sort -n
1
2
3
4
5
6
7
8
9
10
11
12

and the same for array2.

Wrap these pipes into process substitutions and use them as arguments for comm:
```
comm -23 --nocheck-order \
    <(printf "%s\n" "${array1[@]}" | sort -n) \
    <(printf "%s\n" "${array2[@]}" | sort -n)
```
The -23 reduces the result to values unique to the first array; --nocheck-order suppresses a warning about the input not being lexicographically sorted. The output of this is
```
4
8
```

Read each line into an array element with mapfile (-t removes the newlines):

mapfile -t arr < <(comm -23 --nocheck-order \
                   <(printf "%s\n" "${array1[@]}" | sort -n) \
                   <(printf "%s\n" "${array2[@]}" | sort -n))

Now, arr contains the two values as shown above.

The sort step is not strictly required, but makes the solution work for non-sorted arrays as well.

You could use `printf "%s\n" "${array1[@]}"` instead of the IFS+echo construct. In this example, using `"${array1[@]}"` instead of `"${array1[*]}"` matters with the `printf` variant; using `*` you get a single string passed as an argument to `printf`. — Jonathan Leffler, Jun 09 '16 at 00:26
@JonathanLeffler That's indeed more elegant. `"${array1[*]}"` instead of `"${array1[@]}"` matters also in my variant, by the way, as otherwise `IFS` isn't used to separate the array elements - just a space. — Benjamin W., Jun 09 '16 at 03:25

blackSmith · Answer 2 · 2016-06-09T11:10:31.397

Following onliner will do the trick :

diff -y <(printf '%s\n' "${array2[@]}") <(printf '%s\n' "${array1[@]}") | grep -Po '[\|\<\>][\t]\K[0-9]+$'

The printf is used to print elements in separate lines. Now the diff -y gives the output :

1           1    
2           2
3           3
5         | 4
6           5
7           6   
9           7
10        | 8
11          9
12          10
            11
            12

Now all you have to filter the numbers after the |(or sometimes< or >). I used grep for this, but sed can be used too. If your array is not sorted, simply add sort to each printf like this :

(printf '%s\n' "${arrayN[@]}"|sort -n)

Find out what values of `array1` are missing in `array2`

2 Answers2