0
declare -a array1=( 1 2 3 4 5 6 7 8 9 10 11 12 )
declare -a array2=( 1 2 3 5 6 7 9 10 11 12 )

In bash, how can I get a third array of the values that are present in array1 but absent in array2? In the above example, the expected output is ( 4 8 )

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • @andlrc. Thanks. I don't quite understand what a command substitution is (but will investigate the question). I have changed the first to correctly define an array. – Remi.b Jun 08 '16 at 22:59
  • @Remi.b Any reason you are doing this? This isn't really what shell scripting is for. Of cause it's possible, but do you have a specific goal in mind? – Andreas Louv Jun 08 '16 at 23:08

2 Answers2

1

With mapfile, comm, sort and process substitution:

array1=( 1 2 3 4 5 6 7 8 9 10 11 12 )
array2=( 1 2 3 5 6 7 9 10 11 12 )
mapfile -t arr < <(comm -23 --nocheck-order \
                       <(printf "%s\n" "${array1[@]}" | sort -n) \
                       <(printf "%s\n" "${array2[@]}" | sort -n))

Result:

$ declare -p arr
declare -a arr='([0]="4" [1]="8")'

Explanation, from the inside out:

  • Print the array one element per line and sort:

    $ printf "%s\n" "${array1[@]}" | sort -n
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    

    and the same for array2.

  • Wrap these pipes into process substitutions and use them as arguments for comm:

    comm -23 --nocheck-order \
        <(printf "%s\n" "${array1[@]}" | sort -n) \
        <(printf "%s\n" "${array2[@]}" | sort -n)
    

    The -23 reduces the result to values unique to the first array; --nocheck-order suppresses a warning about the input not being lexicographically sorted. The output of this is

    4
    8
    
  • Read each line into an array element with mapfile (-t removes the newlines):

    mapfile -t arr < <(comm -23 --nocheck-order \
                       <(printf "%s\n" "${array1[@]}" | sort -n) \
                       <(printf "%s\n" "${array2[@]}" | sort -n))
    

    Now, arr contains the two values as shown above.

The sort step is not strictly required, but makes the solution work for non-sorted arrays as well.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • 1
    You could use `printf "%s\n" "${array1[@]}"` instead of the IFS+echo construct. In this example, using `"${array1[@]}"` instead of `"${array1[*]}"` matters with the `printf` variant; using `*` you get a single string passed as an argument to `printf`. – Jonathan Leffler Jun 09 '16 at 00:26
  • @JonathanLeffler That's indeed more elegant. `"${array1[*]}"` instead of `"${array1[@]}"` matters also in my variant, by the way, as otherwise `IFS` isn't used to separate the array elements - just a space. – Benjamin W. Jun 09 '16 at 03:25
1

Following onliner will do the trick :

diff -y <(printf '%s\n' "${array2[@]}") <(printf '%s\n' "${array1[@]}") | grep -Po '[\|\<\>][\t]\K[0-9]+$'

The printf is used to print elements in separate lines. Now the diff -y gives the output :

1           1    
2           2
3           3
5         | 4
6           5
7           6   
9           7
10        | 8
11          9
12          10
            11
            12

Now all you have to filter the numbers after the |(or sometimes< or >). I used grep for this, but sed can be used too. If your array is not sorted, simply add sort to each printf like this :

(printf '%s\n' "${arrayN[@]}"|sort -n)
blackSmith
  • 3,054
  • 1
  • 20
  • 37