45

I have an array of ids

a1 = [1, 2, 3, 4, 5]  

and I have another array of objects with ids in random order

a2 = [(obj_with_id_5), (obj_with_id_2), (obj_with_id_1), (obj_with_id_3), (obj_with_id_4)]  

Now I need to sort a2 according to the order of ids in a1. So a2 should now become:

[(obj_with_id_1), (id_2), (id_3), (id_4), (id_5)]  

a1 might be [3, 2, 5, 4, 1] or in any order but a2 should correspond to the order of ids in a1.

I do like this:

a1.each_with_index do |id, idx|
  found_idx = a1.find_index { |c| c.id == id }
  replace_elem = a2[found_idx]
  a2[found_idx] = a2[idx]
  a2[idx] = replace_elem
end  

But this still might run into an O(n^2) time if order of elements of a2 is exactly reverse of a1. Can someone please tell me the most efficient way of sorting a2?

Ari53nN3o
  • 1,202
  • 2
  • 14
  • 21

4 Answers4

88

I'll be surprised if anything is much faster than the obvious way:

a2.sort_by{|x| a1.index x.id}
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • assuming a1 is sorted (and it is from the problem stmt) and the container you're using for a1 can take advantage of the fact that a1 is sorted then I agree this would be faster than O(n^2). – Jato Aug 15 '12 at 05:36
  • 1
    No a1 being sorted is not an advantage, I'm not sure why you would think that. This way is fast because it's built-in. Trying to beat built-in sort_by seems a waste of time to me. – pguardiario Aug 15 '12 at 06:24
  • a1 being sorted is an advantage. If it is sorted then the index operation should run in O(log n) time (assuming binary search) and if it is not sorted the index will run in O(n) time. – Jato Aug 15 '12 at 13:38
  • If a1 is sorted and you provide an index method that takes advantage of this fact to return the index via binary search rather than iteration, then the index operation should run in O(log n) time vs O(n) time. What is incorrect? – Jato Aug 15 '12 at 15:19
  • 6
    This method is blazing fast too but using hashes is like travelling faster than the speed of light. I ran a test with both methods for 10,000 numbers (just for the sake of testing). Your method took 1.3secs on an avg but with hashes it took 0.009secs on avg.. – Ari53nN3o Aug 15 '12 at 15:35
  • 1
    -1 for this method. For Example: x - Array of 1000 elements, not sorted x2 - Array of same elements, sorted `Benchmark.bm { |t| t.report('test1') { x.index_by { |c| c }.values_at(*x2).compact } t.report('test2') { x.sort_by { |v| x2.index v } } }` test1 real: 0.000709 test2 real: 0.048563 – isqad Mar 05 '15 at 21:13
  • This one did the trick for me. In my case the ordering array was smaller than the one to be ordered, so I did: a2.sort_by{|x| a1.index(x.id) || a1.size } to make the remaining elements keep their order. Thanks for your answer, man! – Mauricio Moraes Aug 14 '15 at 13:01
  • The given solution is O(n^2) even if a1 is sorted because Array#index doesn't do a binary search. Ruby 2.3 added `bsearch_index` so if you switch to that, then it's index itself is O(logn). Given this is done n times for a2, it will be O(nlogn). @isqad's solution is better. But it creates 3 objects. 1. index_by hash, 2. values_at array 3. compact another array. You can simplify this by using `compact!`. So it's space vs time complexity at this point. – Bibek Shrestha Oct 04 '17 at 17:46
27
hash_object = objects.each_with_object({}) do |obj, hash| 
  hash[obj.object_id] = obj
end

[1, 2, 3, 4, 5].map { |index| hash_object[index] }
#=> array of objects in id's order

I believe that the run time will be O(n)

megas
  • 21,401
  • 12
  • 79
  • 130
  • 1
    I believe this would be O(n^2). the actual sort is O(n), but the preparation step would make it n^2 – Jato Aug 15 '12 at 05:34
  • 3
    I'm not agree, to build hash table require O(n), look here http://en.wikipedia.org/wiki/Hash_table – megas Aug 15 '12 at 05:49
  • 3
    Yes, building the hash table is O(n) time. And the sort is O(n) time. So you have 2xO(n)... hmmm... that would be less than n^2. I stand corrected. good catch! – Jato Aug 15 '12 at 13:41
  • The first step seems the same as using `hash_object = objects.index_by(&:object_id)` – ozzyaaron Jul 25 '14 at 15:17
  • @kamal: It's O(n), but doesn't do what's asked -- it'll return `[nil, nil, nil, nil, nil]` unless the object_ids happen to be the numbers 1 through 5. To make it work, you need to get the object_ids and sort them, which won't be any better than `objects.index_by(&:object_id)`. Also, it isn't necessary to explain the O(n) claim here, but note that the O(n log n) lower bound only applies to comparison sorts. – johncip Feb 13 '17 at 07:46
  • Note that lookup on hash is ~O(1), thats why search step behaves like O(n), but: - This will take an extra O(n) memory - Building step will incrementally increase hash size (unless you previously reserve this space) and increasing a hash size is about O(n), making creation step something like O(nlog(n)) (hashes used to increase its size exponentially, thats why increase operation will occur like log(n) times) Anyway, this seems to be best option, but be aware it seems to use O(nlog(n)) ops and O(n) aditional memory – John Owen Chile Apr 12 '17 at 21:07
  • so `O(2n)`? Why does it matter -- who really cares about big-O notation? – BotNet Mar 09 '18 at 14:40
20

I like the accepted answer, but in ActiveSupport there is index_by which makes creating the initial hash even easier. See Cleanest way to create a Hash from an Array

In fact you could do this in one line since Enumerable supports index_by as well:

a2.index_by(&:id).values_at(*a1)
Community
  • 1
  • 1
Eric Woodruff
  • 6,380
  • 3
  • 36
  • 33
  • This only works if you don't have any duplicates in your original list. Index by will overwrite any duplicate ids. This may or may not be an issue for you. – Josh Feb 04 '16 at 20:20
7

Inspired by Eric Woodruff's Answer, I came up with the following vanilla Ruby solution:

a2.group_by(&:object_id).values_at(*a1).flatten(1)

Method documentation:

Community
  • 1
  • 1
Ajedi32
  • 45,670
  • 22
  • 127
  • 172
  • I like this solution best. It's efficient (and I suspect more efficient than @pguardiario's solution) and, importantly, permits two elements of `a2` to have the same value for "id". The question does not state that the id's are unique, yet some answers, including the one selected, depend on the id's being unique. – Cary Swoveland Oct 08 '17 at 18:07