12

Sample input:

"I was 09809 home -- Yes! yes!  You was"

and output:

{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }

My code that does not work:

def get_words_f(myStr)
    myStr=myStr.downcase.scan(/\w/).to_s;
    h = Hash.new(0)
    myStr.split.each do |w|
       h[w] += 1 
    end
    return h.to_a;
end

print get_words_f('I was 09809 home -- Yes! yes!  You was');
Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Ben
  • 25,389
  • 34
  • 109
  • 165

7 Answers7

20

This works but I am kinda new to Ruby too. There might be a better solution.

def count_words(string)
  words = string.split(' ')
  frequency = Hash.new(0)
  words.each { |word| frequency[word.downcase] += 1 }
  return frequency
end

Instead of .split(' '), you could also do .scan(/\w+/); however, .scan(/\w+/) would separate aren and t in "aren't", while .split(' ') won't.

Output of your example code:

print count_words('I was 09809 home -- Yes! yes!  You was');

#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}
stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
emre nevayeshirazi
  • 18,983
  • 12
  • 64
  • 81
  • 1
    There's no need to use 'return', just frequency – megas Mar 12 '12 at 22:34
  • I know but i think return makes it more easy to read and understand. maybe because i am coming from java, c++ ... – emre nevayeshirazi Mar 12 '12 at 22:40
  • what if `frequency[word.downcase]` doesn't exist? – Incerteza Nov 18 '14 at 03:24
  • @AleksanderPohl This was true when you wrote it, but [ruby `2.4+` has added support for non-ascii case conversion](https://bugs.ruby-lang.org/issues/10085). @アレックス That's what the `Hash.new(0)` is for: it's specifying a default value of `0`. – Tom Lord Jun 07 '17 at 12:24
9
def count_words(string)
  string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end

Second variant:

def count_words(string)
  string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end
Tom Lord
  • 27,404
  • 4
  • 50
  • 77
megas
  • 21,401
  • 12
  • 79
  • 130
6
def count_words(string)
  Hash[
    string.scan(/[a-zA-Z]+/)
      .group_by{|word| word.downcase}
      .map{|word, words|[word, words.size]}
  ]
 end

puts count_words 'I was 09809 home -- Yes! yes!  You was'
  • I like the Hash[] syntax :-) +1 – christianblais Mar 12 '12 at 22:04
  • @christianblais I do too, but I sorta feel like I shouldn't need it in this case. In my projects, I usually add `map_hash` to `Enumerable`, which bakes together `map` and `Hash[]`. –  Mar 12 '12 at 22:08
3

This code will ask you for input and then find the word frequency for you:

    puts "enter some text man"
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word.downcase] += 1 }
frequencies = frequencies.sort_by {|a, b| b}
frequencies.reverse!
frequencies.each do |word, frequency|
    puts word + " " + frequency.to_s 
end
Drew
  • 31
  • 1
2

This works, and ignores the numbers:

def get_words(my_str)
    my_str = my_str.scan(/\w+/)
    h = Hash.new(0)
    my_str.each do |s|
        s = s.downcase
        if s !~ /^[0-9]*\.?[0-9]+$/ 
            h[s] += 1
        end
    end
    return h
end

print get_words('I was there 1000 !')
puts '\n'
Ambidextrous
  • 810
  • 6
  • 14
2

You can look at my code that splits the text into words. The basic code would look as follows:

sentence = "Ala ma kota za 5zł i 10$."
splitter = SRX::Polish::WordSplitter.new(sentence)
histogram = Hash.new(0)
splitter.each do |word,type|
  histogram[word.downcase] += 1 if type == :word
end
p histogram

You should be careful if you wish to work with languages other than English, since in Ruby 1.9 the downcase won't work as you expected for letters such as 'Ł'.

Aleksander Pohl
  • 1,675
  • 10
  • 14
2
class String
  def frequency
    self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash|
      hash[word.downcase] += 1
    end
  end
end

puts "I was 09809 home -- Yes! yes! You was".frequency

christianblais
  • 2,448
  • 17
  • 14