This is a very simple question; which items appear in the list more than once?
array = ["mike", "mike", "mike", "john", "john", "peter", "clark"]
The correct answer is ["mike", "john"]
.
Seems like we can just do:
array.select{ |e| ary.count(e) > 1 }.uniq
Problems solved. But wait! What if the array is REALLY big:
1_000_000.times { array.concat("1234567890abcdefghijklmnopqrstuvwxyz".split('')) }
It just so happens I need to figure out how to do this in a reasonable amount of time. We're talking millions and millions of records.
For what it's worth, this massive array is actually a sum of 10-20 smaller arrays. If it's easier to compare those, let me know - I'm stumped.
We're talking 10,000 to 10,000,000 lines per file, hundreds of files.