1

I'd like to get a word list from a text file using Ruby. I found how to use regex to parse only words here, so I made a script like following:

src = File.open("text.txt")
word_list = []
src.each do |line|
  word_list << line.downcase.split(/[^[:alpha:]]/).delete_if {|x| x == ""}
end
word_list.flatten!.uniq!.sort!
p word_list

And the following is a sample text file text.txt:

TextMate may be the latest craze for developing Ruby on Rails applications, but Vim is forever. This plugin offers the following features for Ruby on Rails application development.

  1. Automatically detects buffers containing files from Rails applications, and applies settings to those buffers (and only those buffers). You can use an autocommand to apply your own custom settings as well.

  2. Unintrusive. Only files in a Rails application should be affected; regular Ruby scripts are left untouched. Even when enabled, the plugin should keep out of your way if you're not using its features.

  3. Easy navigation of the Rails directory structure. gf considers context and knows about partials, fixtures, and much more. There are two commands, :A (alternate) and :R (related) for easy jumping between files, including favorites like model to migration, template to helper, and controller to functional test. For more advanced usage, :Rmodel, :Rview, :Rcontroller, and several other commands are provided.

As a Ruby novice, I'd like to learn better (more clear, concise, and following conventions) solutions for this problem.

Thanks for any advices and corrections.

Community
  • 1
  • 1
philipjkim
  • 3,999
  • 7
  • 35
  • 48
  • Don't you think more clear and concise are contradictory? If you are a novice, I suggest you spend some time understanding the methods instead of reposting. – Subs Jun 05 '12 at 10:29
  • I don't think this is just a repost. What I want to know is more Ruby way to solve this problem, not spliting words using regex. – philipjkim Jun 05 '12 at 10:32
  • What exactly is your question then? I see some code.... – Subs Jun 05 '12 at 10:34
  • Your input file is missing. Does it include hyphenated words like `word-list`? – Subs Jun 05 '12 at 10:37
  • I've changed the tile of this question to express purpose clearly. Also, I added input text file to the question. No hyphenated words in the text file. – philipjkim Jun 05 '12 at 10:45
  • Does these words include the following - :A , you're - IF it does, then your regular expression is wrong. – Subs Jun 05 '12 at 10:53
  • ":A" is split by colon, since colon is not an alphabet chararcter. Then I can get "A" as a word element. If I don't want A to be in a list, the regex is wrong as you mentioned. Am I getting it right? – philipjkim Jun 05 '12 at 11:03
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/12163/discussion-between-subs-and-philipjkim) – Subs Jun 05 '12 at 13:14

2 Answers2

6

A more idiomatic code would be:

word_list = open("text.txt")
  .lines
  .flat_map { |line| line.downcase.split(/[^[:alpha:]]/).reject(&:empty?) }
  .uniq
  .sort
tokland
  • 66,169
  • 13
  • 144
  • 170
3
# I suppose you want each line and collect the results
word_list = File.open("text.txt").each_line.collect do |line|
   # collecting is done via collect above, no need anymore
   # .reject(&:empty?) calls .empty? on each element
   line.downcase.split(/[^[:alpha:]]/).reject(&:empty?)
# you can chain on blocks as well
end.flatten!.uniq!.sort!

p word_list
Reactormonk
  • 21,472
  • 14
  • 74
  • 123
  • I got an error: `undefined method 'each_line' for File:Class (NoMethodError)` Am I doing something wrong? (FYI I use ruby1.9.3) – philipjkim Jun 05 '12 at 10:58