0
lines = File.readlines("new text document.txt")
lines_count = lines.size
text = lines.join
no_of_chars = text.length

puts "number of lines: #{lines_count}"
puts "number of chars: #{no_of_chars}"

Hello my objective is to count the number of chars in the text document. What I do not understand is why is there a need to lines.join? And what is the program doing when you lines.join? Because when I puts lines or puts lines.join the program prints out the exact same thing. Therefore what I did (and what I think is correct) is

no_of_chars = lines.length

which is obviously wrong since by doing that the no_of_char will result in an output that is same as number of lines.

ndnenkov
  • 35,425
  • 9
  • 72
  • 104
roppo
  • 57
  • 7

5 Answers5

3

When you do this

lines = File.readlines("new text document.txt")

you have an array of strings, i.e.:

lines #=> [
  "The surgeon lead over ...\n", # <- There's a newline at the end of each string
  "The medical gentleman ...\n",
]

There are as many entries on the array as lines on your text file. That's why you count the number of lines by doing:

lines_count =lines.size

When you call lines.join you are essentially concatenating all the strings together one after another

text = lines.join
text # => "The surgeon ... dress the infant"

And to calculate the number of characters of string you just call length on it.

The reason they look similar to you on the console is because when you print them they get represented in an identical way. To highlight the difference you may call inspect on each of them:

puts lines.inspect
puts text.inspect
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
ariera
  • 996
  • 8
  • 26
  • Hi if I am concatenating all the strings together using `.join`, doesn't this mean that all the lines are combined together without spacing. That is to say text #=> "The surgeon... Good-Night!'The medical gentleman... infant."? Since no parameter(in this case whitespace) is stated in the method `.join` – roppo Mar 29 '17 at 12:10
  • You are right. If you want to have a whitespace in between each entry of the array you could do `lines.join(" ")` – ariera Mar 29 '17 at 12:12
  • However I tried running `puts lines.join` to my console, the result is that the lines are separated with spacing. Despite the fact that I did not state the parameter (whitespace) for method `.join` – roppo Mar 29 '17 at 12:22
  • 1
    readlines doesn't strip newline characters, so those characters still exist in your individual lines. So when you join them, the lines will still have newlines between them. – David Stanley Mar 29 '17 at 12:31
  • @davidstanley Is there a ruby doc that talks about it so i can read up more? – roppo Mar 29 '17 at 13:04
  • Sure, [the standard documentation for IO](http://ruby-doc.org/core-2.4.1/IO.html) – David Stanley Mar 29 '17 at 13:14
3

It might be cleaner to do it the other way round, i.e. read the whole file as a single string via IO.read and split the lines afterwards using String#lines:

text = IO.read('document.txt')
no_of_chars = text.length
lines_count = text.lines.length

puts "number of lines: #{lines_count}"
puts "number of chars: #{no_of_chars}"

Note that String#length will count any character, including punctuation, spaces and newline characters.

Stefan
  • 109,145
  • 14
  • 143
  • 218
2

lines is an array. The visualization of an array is similar to the result of joining the array:

numbers = ['11', '22', '33', '44']
puts numbers
 # 11
 # 22
 # 33
 # 44
puts numbers.join
 # 11223344

If the array has new lines at the end (like is the case if you just got lines from a file), you wouldn't be able to tell the two apart. Yet they are different:

numbers.length      # => 4
numbers.join.length # => 8

The length of the array will tell you how many elements the array has. In your case - how many lines in the file. In my case - how many numbers.

If you join the array, you concatenate each individual line. Hence the length of the resulting string will give you how many characters are in the entire file. In my case - how many digits.

ndnenkov
  • 35,425
  • 9
  • 72
  • 104
1

What I do not understand is why is there a need to lines.join?

To get a full text(all items in the array) in one string

And what is the program doing when you lines.join?

Check the documentation.

Because when I puts lines or puts lines.join the program prints out the exact same thing.

Because lines is an array when you run puts lines it is shows an array as string, join() merge all array items to one string.

>> puts "fooo" 
#> fooo 
=> nil 
>> puts ["fooo"].join
#> fooo
=> nil

Therefore to get the number of chars in the text, why am I unable to just use no_of_chars = lines.length?

Becase when you use lines.length it's show the length of an array.

Roman Kiselenko
  • 43,210
  • 9
  • 91
  • 103
1

Slurping a file isn't a good way to find the number of characters because it's not scalable. It's easy these days to find files into the GB range, especially in production environments that will hit your Ruby process very hard when trying to slurp the entire file.

Instead, use this:

char_size = 0
File.foreach('path/to/file.txt') do |li|
  char_size += li.size
end

foreach reads the file line-by-line, which, for a file over 1MB, is as fast or faster than using read or readlines, while still being scalable.

See "Why is "slurping" a file not a good practice?" for more information.

If you know the file contains single-byte characters, such as the traditional ASCII, ISO-8859 or Win-1252 character sets, you can do it even faster using File.size('path/to/file.txt').

Using size doesn't require reading the file at all, so it is much faster than any solution that actually opens and reads the content.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303