Find a non empty intersection using the maximum number of elements in a bidimensional array

Question

I have a bidimensional array :

   keys = [[:reference],
     [:parent_ref, :kind],
     [:kind, :parent_ref, :reference],
     [:parent_ref, :kind, :status]]

Let's say I want the intersection of all those arrays. I can do :

keys.reduce{|arr, acc| arr & acc}

which will result in [] because there is no universal common key. Now let's say I want to find a "non empty intersection" using the maximum number of elements in the array. For instance the intersection using this "method" would be [:parent_ref, :kind] because it is the intersection of

[[:parent_ref, :kind],
         [:kind, :parent_ref, :reference],
         [:parent_ref, :kind, :status]]

We just have to put [:reference] aside.

How would you approach/create such an algorithm.

Not sure if I understand the problem and your example. What do you mean by _"using the maximum number of elements"_? Can you explain it in a different way? — Stefan, Jun 04 '19 at 11:33
To me it seems like elements are the inner arrays in this case. So he is looking for the intersection with the highest number of involved arrays. See the red one in this example: https://i.imgur.com/bxLBqf3.png — claasic, Jun 04 '19 at 11:48

iGian · Answer 1 · 2019-06-04T13:40:17.513

0

Brute force, using permutations:

keys.combination(2).map { |a, b| a & b }.max_by(&:size)
#=> [:parent_ref, :kind]

Finding the elements used, also brute force:

res = [:parent_ref, :kind]
keys.select { |e| e.sort & res.sort == res.sort }
#=> [[:parent_ref, :kind], [:kind, :parent_ref, :reference], [:parent_ref, :kind, :status]]

edited Jun 04 '19 at 13:40

answered Jun 04 '19 at 12:17

iGian

11,023
3
21
36

score 0 · Answer 2 · answered Jun 04 '19 at 13:09

Suppose the bidimensional array is an array of strings where each string is an array of characters. Suppose we have the strings:

CDB
BC
CBA
FDE
EDBC

First sort every string in increasing order.

BCD
BC
ABC
DEF
BCDE

Then sort the strings.

ABC
BC
BCD
BCDE
DEF

For each element, find the one that appears in the longest sequence of consecutive strings. In our example, it would be B and C. Their consecutive strings is the largest number of strings that have a non-empty intersection.

score 0 · Answer 3 · answered Jun 04 '19 at 14:01

I believe your question is linked to this one, with this answer being relevant.

You can translate your input to shorter names for readability:

matrix = [[:a], [:b, :c], [:c, :b, :a], [:b, :c, :d]]

You can iterate over the values, and keep a Hash containing all the sets each value is in:

matrix.each.with_index(1) do |row, i|
  row.each do |value|
    grouped_values[value] << i
  end
end

p grouped_values
# => {:a=>[1, 3], :b=>[2, 3, 4], :c=>[2, 3, 4], :d=>[4]}

So :a is present in sets 1 and 3, b in sets 2, 3 and 4...

You can then group the sets together:

grouped_sets = Hash.new{|h, k| h[k] = []}

grouped_values.each do |value, sets|
  grouped_sets[sets] << value if sets.size > 1
end

p grouped_sets
# => {[1, 3]=>[:a], [2, 3, 4]=>[:b, :c]}

So the intersection of sets 1 and 3 is [:a], and the intersection of sets 2, 3, and 4 is [:b, :c].

Finally, you can either pick the intersection with the most sets or with the most elements.

I think you are missing some intersections here : say you have ```b=>[2,3,4]``` and ```c=>[2,3,4,5]``` this solution will miss ```[2,3,4] => [:b, :c]``` — David Geismar, Jun 05 '19 at 09:46
@DavidGeismar: It seems you're right. It finds the intersection with the most subsets, not with the most elements. — Eric Duminil, Jun 05 '19 at 15:26

Cary Swoveland · Answer 4 · 2019-06-04T18:37:58.370

arr = [
  [:reference],
  [:parent_ref, :kind],
  [:kind, :parent_ref, :reference],
  [:parent_ref, :kind],
  [:parent_ref, :kind, :status],
  [:kind, :parent_ref, :kind]
]

Note that I've modified keys given in the example to include two types of duplicates.

require 'set'

h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |a,h| h[a.to_set] << a }
  #=> {#<Set: {:reference}>=>[[:reference]],
  #    #<Set: {:parent_ref, :kind}>=>[[:parent_ref, :kind],
  #             [:parent_ref, :kind], [:kind, :parent_ref, :kind]],
  #    #<Set: {:kind, :parent_ref, :reference}>=>[[:kind, :parent_ref, :reference]],
  #    #<Set: {:parent_ref, :kind, :status}>=>[[:parent_ref, :kind, :status]]} 
g = h.each_key.with_object({}) do |k,g|
  g[k] = h[k].dup
  h.each { |kk,v| g[k].concat(v) if k < kk }
end 
  #=> {#<Set: {:reference}>=>[[:reference], [:kind, :parent_ref, :reference]],
  #    #<Set: {:parent_ref, :kind}>=>[[:parent_ref, :kind], [:parent_ref, :kind],
  #             [:kind, :parent_ref, :kind], [:kind, :parent_ref, :reference],
  #             [:parent_ref, :kind, :status]],
  #    #<Set: {:kind, :parent_ref, :reference}>=>[[:kind, :parent_ref, :reference]],
  #    #<Set: {:parent_ref, :kind, :status}>=>[[:parent_ref, :kind, :status]]} 
a = g.max_by { |k,v| v.size }
  #=> [#<Set: {:parent_ref, :kind}>, [[:parent_ref, :kind], [:parent_ref, :kind],
  #     [:kind, :parent_ref, :kind], [:kind, :parent_ref, :reference],
  #     [:parent_ref, :kind, :status]]] 
[a.last.first, a.last.drop(1)]
  #=> [[:parent_ref, :kind], [[:parent_ref, :kind], [:kind, :parent_ref, :kind],
  #    [:kind, :parent_ref, :reference], [:parent_ref, :kind, :status]]]

This shows the array [:parent_ref, :kind], when converted to a set, is a subset of the following other elements of array after being converted to sets:

[[:parent_ref, :kind], [:kind, :parent_ref, :kind],
 [:kind, :parent_ref, :reference], [:parent_ref, :kind, :status]]

and that no other element of arr (after all elements of arr are converted to sets) has a greater number of supersets.

Note that, if desired, the calculations of g and a can be chained.

See Set#<.

If neither arr nor any element of arr may contain duplicates, the calculations are of course simplified.

arr = [
  [:reference],
  [:parent_ref, :kind],
  [:kind, :parent_ref, :reference],
  [:parent_ref, :kind, :status]
]

require 'set'

sets = arr.map(&:to_set)
  #=> [#<Set: {:reference}>, #<Set: {:parent_ref, :kind}>,
  #    #<Set: {:kind, :parent_ref, :reference}>, #<Set: {:parent_ref, :kind, :status}>]
h = sets.each_with_object(Hash.new { |h,k| h[k] = [] }) { |s,h|
       sets.each { |ss| h[s] << ss if s < ss } }
  #=> {#<Set: {:reference}>=>[#<Set: {:kind, :parent_ref, :reference}>],
  #    #<Set: {:parent_ref, :kind}>=>[#<Set: {:kind, :parent_ref, :reference}>,
  #    #<Set: {:parent_ref, :kind, :status}>]}
k, v = h.max_by { |_,v| v.size }
  #=> [#<Set: {:parent_ref, :kind}>,
  #     [#<Set: {:kind, :parent_ref, :reference}>, #<Set: {:parent_ref, :kind, :status}>]]
[k.to_a, v.map(&:to_a)]
  #=> [[:parent_ref, :kind],
  #    [[:kind, :parent_ref, :reference], [:parent_ref, :kind, :status]]]

Find a non empty intersection using the maximum number of elements in a bidimensional array

4 Answers4