32

Is there any efficient way to get intersection of two slices in Go?

I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}

intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
bandi shankar
  • 597
  • 2
  • 6
  • 8

7 Answers7

39

How do I get the intersection between two arrays as a new array?

  • Simple Intersection: Compare each element in A to each in B (O(n^2))
  • Hash Intersection: Put them into a hash table (O(n))
  • Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))

All of which are implemented here

https://github.com/juliangruber/go-intersect/blob/master/intersect.go

Carl Walsh
  • 6,100
  • 2
  • 46
  • 50
KeksArmee
  • 1,349
  • 14
  • 21
  • 1
    do we have any in build library for it in go ? – bandi shankar Jul 06 '17 at 18:27
  • No. There is no such build in package in go for this. – Mayank Patel Jul 06 '17 at 18:31
  • 1
    Remember that although an $O(n)$ solution is better than an $O(n^2)$ solution for big $n$, if $n$ is small it's probably better to go with the $O(n^2)$ solution than with the $O(n)$ solution since the latter might have more overhead. – md2perpe Jul 06 '17 at 18:55
  • 2
    @md2perpe you can't make a generalization like that based on what "might" have more overhead. It's probably better to go with whatever is most efficient, and best to actually test with each and see what works meets requirements for a specific situation. – Adrian Jul 06 '17 at 20:03
  • 8
    @Adrian, tell Rob Pike (https://en.wikipedia.org/wiki/Rob_Pike) that. See his Rule 3 here: http://users.ece.utexas.edu/~adnan/pike.html – md2perpe Jul 06 '17 at 20:15
  • I know who RP is and that his word, like everyone else's, is not gospel. Just because an algorithm is faster, does not mean it's "fancy". And "Rule 2" is exactly what I was saying - test the available options and choose what works best in that situation based on empirical observation. – Adrian Jul 06 '17 at 20:20
  • @bandishankar Just run `go get "github.com/juliangruber/go-intersect"`. You can then use the same path to access its code. Don't forget to give credit to the guy :) – KeksArmee Jul 07 '17 at 15:09
5

simple, generic and mutiple slices ! (Go 1.18)

Time Complexity : may be linear

func interSection[T constraints.Ordered](pS ...[]T) []T {
    hash := make(map[T]*int) // value, counter
    result := make([]T, 0)
    for _, slice := range pS {
        duplicationHash := make(map[T]bool) // duplication checking for individual slice
        for _, value := range slice {
            if _, isDup := duplicationHash[value]; !isDup { // is not duplicated in slice
                if counter := hash[value]; counter != nil { // is found in hash counter map
                    if *counter++; *counter >= len(pS) { // is found in every slice
                        result = append(result, value)
                    }
                } else { // not found in hash counter map
                    i := 1
                    hash[value] = &i
                }
                duplicationHash[value] = true
            }
        }
    }
    return result
}
func main() {
    slice1 := []string{"foo", "bar", "hello"}
    slice2 := []string{"foo", "bar"}
    fmt.Println(interSection(slice1, slice2))
    // [foo bar]

    ints1 := []int{1, 2, 3, 9, 8}
    ints2 := []int{10, 4, 2, 4, 8, 9} // have duplicated values
    ints3 := []int{2, 4, 8, 1}
    fmt.Println(interSection(ints1, ints2, ints3))
    // [2 8]
}

playground : https://go.dev/play/p/lE79D0kOznZ

Vivian K. Scott
  • 515
  • 2
  • 5
  • 13
4

It's a best method for intersection two slice. Time complexity is too low.

Time Complexity : O(m+n)

m = length of first slice.

n = length of second slice.

func intersection(s1, s2 []string) (inter []string) {
    hash := make(map[string]bool)
    for _, e := range s1 {
        hash[e] = true
    }
    for _, e := range s2 {
        // If elements present in the hashmap then append intersection list.
        if hash[e] {
            inter = append(inter, e)
        }
    }
    //Remove dups from slice.
    inter = removeDups(inter)
    return
}

//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
    encountered := make(map[string]bool)
    for _, element := range elements {
        if !encountered[element] {
            nodups = append(nodups, element)
            encountered[element] = true
        }
    }
    return
}
  • 1
    you don't need `removeDups` if you use the boolean value of the hash map as an indication of whether you've already added the same element to the resulting arary. – Dan Markhasin Dec 05 '20 at 08:51
2

if there exists no blank in your []string, maybe you need this simple code:

func filter(src []string) (res []string) {
    for _, s := range src {
        newStr := strings.Join(res, " ")
        if !strings.Contains(newStr, s) {
            res = append(res, s)
        }
    }
    return
}

func intersections(section1, section2 []string) (intersection []string) {
    str1 := strings.Join(filter(section1), " ")
    for _, s := range filter(section2) {
        if strings.Contains(str1, s) {
            intersection = append(intersection, s)
        }
    }
    return
}
Nevercare
  • 81
  • 3
1

https://github.com/viant/toolbox/blob/a46fd679bbc5d07294b1d1b646aeacd44e2c7d50/collections.go#L869-L920

Another O(m+n) Time Complexity solution that uses a hashmap. It has two differences compared to the other solutions discussed here.

  • Passing the target slice as a parameter instead of new slice returned
  • Faster to use for commonly used types like string/int instead of reflection for all
Jack Wilsdon
  • 6,706
  • 11
  • 44
  • 87
1

Try it

https://go.dev/play/p/eGGcyIlZD6y

first := []string{"one", "two", "three", "four"}
second := []string{"two", "four"}

result := intersection(first, second) // or intersection(second, first)

func intersection(first, second []string) []string {
    out := []string{}
    bucket := map[string]bool{}

    for _, i := range first {
        for _, j := range second {
            if i == j && !bucket[i] {
                out = append(out, i)
                bucket[i] = true
            }
        }
    }

    return out
}
Roman
  • 11
  • 3
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Timmy Jun 17 '22 at 07:48
  • the logic is outer loop to get each string from first slice, inner loop to compare the string from first slice to each string in second slice, if it is same string and hash map keyed by the string not exist, add the string to the new slice, set the hash map keyed by the string to true to avoid duplicate I guess – 99Linux Sep 27 '22 at 03:14
-8

Yes there are a few different ways to go about it.. Here's an example that can be optimized.

package main

import "fmt"

func intersection(a []string, b []string) (inter []string) {
    // interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
    low, high := a, b
    if len(a) > len(b) {
        low = b
        high = a
    }

    done := false
    for i, l := range low {
        for j, h := range high {
            // get future index values
            f1 := i + 1
            f2 := j + 1
            if l == h {
                inter = append(inter, h)
                if f1 < len(low) && f2 < len(high) {
                    // if the future values aren't the same then that's the end of the intersection
                    if low[f1] != high[f2] {
                        done = true
                    }
                }
                // we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
                high = high[:j+copy(high[j:], high[j+1:])]
                break
            }
        }
        // nothing in the future so we are done
        if done {
            break
        }
    }
    return
}

func main() {
    slice1 := []string{"foo", "bar", "hello", "bar"}
    slice2 := []string{"foo", "bar"}
    fmt.Printf("%+v\n", intersection(slice1, slice2))
}

Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.

func intersectionString(a []string, b []string) (inter []string),

func intersectionInt(a []int, b []int) (inter []int),

func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect

You can then create your own package and reuse once you settle how you want to implement it.

package intersection

func String(a []string, b []string) (inter []string)

func Int(a []int, b []int) (inter []int)

func Float64(a []Float64, b []Float64) (inter []Float64)
reticentroot
  • 3,612
  • 2
  • 22
  • 39