Here's how I would do it.
Method
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select { |w| w.size == word.size && w.each_char.tally == ltr_freq }
end
Example
list = ['helo', 'elloh', 'heelo', 'llohe']
anagrams(list, 'hello')
#=> ["elloh", "llohe"]
Computational complexity
For practical purposes, the computational complexity of computing w.each_char.tally
for any word w
of length n
can be regarded as O(n
). That's because hash key lookups are nearly constant-time. It follows that the computational complexity of determining whether a word of length n
is an anagram of another word of the same length can be regarded as O(n
).
This compares with methods that sort the letters of a word, which have a computational complexity of O(n*log(n)), n being the word length.
Explanation
See Enumerable#tally. Note that w.each_char.tally
is not computed when w.size == word.size
is false
.
Now let's add some puts
statements to see what is happening.
def anagrams(list, word)
ltr_freq = word.each_char.tally
puts "ltr_freq = #{ltr_freq}"
list.select do |w|
puts "\nw = '#{w}'"
if w.size != word.size
puts "words differ in length"
false
else
puts "w.each_char.tally = #{w.each_char.tally}"
if w.each_char.tally == ltr_freq
puts "character frequencies are the same"
true
else
puts "character frequencies differ"
false
end
end
end
end
anagrams(list, 'hello')
ltr_freq = {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
w = 'helo'
words differ in length
w = 'elloh'
w.each_char.tally = {"e"=>1, "l"=>2, "o"=>1, "h"=>1}
character frequencies are the same
w = 'heelo'
w.each_char.tally = {"h"=>1, "e"=>2, "l"=>1, "o"=>1}
character frequencies differ
w = 'llohe'
w.each_char.tally = {"l"=>2, "o"=>1, "h"=>1, "e"=>1}
character frequencies are the same
#=>["elloh", "llohe"]
Possible improvement
A potential weakness of the expression
w.each_char.tally == ltr_freq
is that the frequencies of all unique letters in the word w
must be determined before a conclusion is reached, even if, for example, the first letter of w
does not appear in the word word
. We can remedy that as follows.
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select do |w|
next false unless w.size == word.size
ltr_freqs_match?(w, ltr_freq.dup)
end
end
def ltr_freqs_match?(w, ltr_freq)
w.each_char do |c|
return false unless ltr_freq.key?(c)
ltr_freq[c] -= 1
ltr_freq.delete(c) if ltr_freq[c].zero?
end
true
end
One would have to test this variant against the original version of anagrams
above to determine which tends to be fastest. This variant has the advantage that it terminates (short-circuits) the comparison as soon as is found that the cumulative count of a given character in the word w
is greater than the total count of the same letter in word
. At the same time, tally
is written in C so it may still be faster.