2

I'm trying to write a script that counts the number of words, but, with some exceptions described using some regular expressions.

The script looks as follows:

number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.size {|word| word !~ standalone_number and word !~ standalone_letter and word !~ email_address  } }
puts number_of_words

As you can see, I don't want to include standalone numbers, letters, or email addresses in the word count,

When I read a text file containing this information:

1 2 ruby email@email.com

I got a word count of 4, while I was expecting to get 1 (ruby only included in the count).

What am I missing here?

Thanks.

EDIT

I fixed the "standalone_letter" regular expression as it was written by mistake similar to the "email_address" regular expression.

I have solve the issue using a solution I have added to the answers.

Simplicity
  • 47,404
  • 98
  • 256
  • 385
  • A very similar question was asked a couple of days ago http://stackoverflow.com/questions/31146079/counting-words-in-ruby-with-some-exceptions/31151986#31151986 – infused Jul 03 '15 at 01:19

4 Answers4

2

Array#size doesn't take a block like that. You're looking for Array#count.

line.split.count { ... } 

Also, just a thought, instead of looping through the lines of the text incrementing a counter, it looks like you just check directly on your original text, line breaks and all, and get the same result.

Ben Lee
  • 52,489
  • 13
  • 125
  • 145
1

The problem is because you use size, which count the number of elements in the array, and it does not accept a block. You have to use count and every thing will go well.

so a match cleaner solution is like this.

standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/

text = file.read
num_of_words = text.split.count{ |word| [standalone_number, standalone_letter, email_address].none?{ |regexp| word =~ regexp } }

puts num_of_words
Nafaa Boutefer
  • 2,169
  • 19
  • 26
  • Thanks for your reply. For your solution I got a count of "3" while I'm expecting a count of "1". – Simplicity Jul 03 '15 at 03:35
  • are you sure that the regular expressions are correct and that they're what you really want? I think that they are too much complicated. Please provide more example inputs and their expected outputs, so that I can correct my solution. – Nafaa Boutefer Jul 03 '15 at 04:20
  • `standalone_letter` and `email_address` are the same. Do you mean by a stand alone letter a word of length 1? and by `standalone_number` a number of a single digit? – Nafaa Boutefer Jul 03 '15 at 04:24
0

You could also delete the matching words from the array as follows:

text.each_line(){ |line| number_of_words = number_of_words + line.split.delete_if {|word| word ~ standalone_number and word ~ standalone_letter and word ~ email_address }.size }
puts number_of_words

This will remove matching elements and then count the size of the array.

ShellFish
  • 4,351
  • 1
  • 20
  • 33
0

This works!

text = File.open('xyz.txt', 'r')
number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /^[a-zA-Z]$/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.count {|word|  word !~ standalone_number && word !~ standalone_letter && word !~  email_address }}
puts number_of_words
Simplicity
  • 47,404
  • 98
  • 256
  • 385