21

How do you compare two arrays in Bash to find all intersecting values?

Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3

I should get back 2 as a result.

My own answer:

for item1 in $array1; do
    for item2 in $array2; do
        if [[ $item1 = $item2 ]]; then
            result=$result" "$item1
        fi
    done
done

I'm looking for alternate solutions as well.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
dabest1
  • 2,347
  • 6
  • 25
  • 25
  • I don’t think you’re going to find a better way to do this. Bash isn’t really built for array manipulation, and I can’t think of a command line tool that could be used for finding the intersection of two arrays. – Daniel Brockman Oct 24 '11 at 00:22

5 Answers5

19

The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}

The result is

1 2 3 6 8 9 11
Fritz G. Mehner
  • 16,550
  • 2
  • 34
  • 41
  • Although it seems many answers provided for this question would work for array or list intersection. I'm picking this answer as it does not require perl and seems to provide a short cut of not using a second loop via regexp. It also answers the original question of array intersection, although I was looking for list intersections, I should rewrite lists as arrays. Thanks everyone. – dabest1 Oct 26 '11 at 21:23
  • This solution will not work if the array contains elements with escaped spaces. – Robsdedude Sep 10 '21 at 17:08
10

Taking @Raihan's answer and making it work with non-files (though FDs are created) I know it's a bit of a cheat but seemed like good alternative

Side effect is that the output array will be lexicographically sorted, hope thats okay (also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

Testing:

$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

$ echo ${result[@]}
1 109 17

p.s. I'm sure there was a way to get the array to out one value per line w/o the for loop, I just forget it (IFS?)

nhed
  • 5,774
  • 3
  • 30
  • 44
  • Pretty good solution -- Im baffled as to what happens with two std-input files in the sub-shell -- looks like it is somehow using /proc/self/fd, but im not able to get it to work with anything else (e.g. cat/echo) – Soren Oct 24 '11 at 01:51
  • @Soren: See http://www.gnu.org/s/bash/manual/bash.html#Process-Substitution. Despite the similar appearance to std-input redirection, those expressions actually get replaced with filenames. I don't know why you can't get it to work with `cat`. On my system, `cat <(echo foo) <(echo bar)` prints `foo bar` (on two lines). Does that not happen on yours? – ruakh Oct 24 '11 at 02:15
  • 6
    `printf -- '%s\n' "${array[@]}"` will output each element on a separate line. – Noel Yap May 29 '15 at 21:08
5

Your answer won't work, for two reasons:

  • $array1 just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
  • After the first element gets added to result, result will then contain a space, so the next run of result=$result" "$item1 will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable result being set to the empty string.) Correction: Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)

What you want is this:

result=()
for item1 in "${array1[@]}"; do
    for item2 in "${array2[@]}"; do
        if [[ $item1 = $item2 ]]; then
            result+=("$item1")
        fi
    done
done
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Maybe I got array and list confused. Is there a difference between arrays and lists in bash? – dabest1 Oct 24 '11 at 01:04
  • 1
    @dabest1: "List" isn't a technical term in Bash. If you didn't mean "array", then I think you must have meant something vague, along the lines of "a string containing whitespace, where the whitespace should be interpreted as separating the components of the string". Obviously there's no one-word term for that. :-) If you post some of the surrounding code that shows how these "arrays" are initialized, and how you're using them, that will probably clarify a lot. – ruakh Oct 24 '11 at 01:11
  • Also -- *regardless* of what you meant, your line `result=$result" "$item1` is not going to do what you think, unless you've set the `IFS` variable to something weird, which I really doubt you have. (And if you *have* set the `IFS` variable to something weird, then you've got different problems!) – ruakh Oct 24 '11 at 01:16
  • @ruahk: Thanks, I was not too clear on the question. I am using a list of items separated by space and `result=$result" "$item1` seems to work fine, even though I am not setting IFS to anything. I will leave the question as is, as this will still help others with array comparison question. – dabest1 Oct 24 '11 at 01:26
  • @dabest: O.K., so you're not using "arrays". Re: "`result=$result" "$item1` seems to work fine": Oops, my mistake: it turns out that (according to http://www.gnu.org/s/bash/manual/bash.html#Shell-Parameters) word splitting is not performed on variable assignments. Mea culpa. – ruakh Oct 24 '11 at 01:44
3

If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command.

$ comm -12 file1 file2
Raihan
  • 10,095
  • 5
  • 27
  • 45
0

Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that * and ? will trigger filename expansion.

But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:

result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
                  print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
                 ' "$array1" "$array2")

(The line-breaks are just for readability; you can get rid of them if you want.)

In the above Bash command, the embedded Perl program creates a hash named %array2 containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.

This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1 contains x twice and array2 contains x three times, then result will contain x six times, whereas in my code, result will contain x only twice. I don't know if that matters, since I don't know your exact requirements.

ruakh
  • 175,680
  • 26
  • 273
  • 307