0

Code sample 1:

def count_lines1(file_name)
  open(file_name) do |file|
    count = 0
    while file.gets
      count += 1
    end
    count
  end
end

Code sample 2:

def count_lines2(file_name)
  file = open(file_name)
  count = 0
  while file.gets
   count += 1
  end
  count
end

I am wondering which is the better way to implement the counting of lines in a file. In terms of good syntax in Ruby.

gogofan
  • 533
  • 1
  • 10
  • 20
  • 1
    The biggest difference is that they're not equivalent; sample 1 closes the file at the end of the block. Beyond that it's opinion. – Dave Newton Oct 06 '16 at 21:02
  • 1
    Use `file_name.readlines.count` – Sagar Pandya Oct 06 '16 at 21:10
  • 1
    Don't use `readlines` unless you KNOW the file is small. `readlines` reads the file contents into memory as an Array. If the file is too big you'll have DOSed yourself. Instead use `foreach` which is faster with big files. – the Tin Man Oct 06 '16 at 23:01

1 Answers1

2

which is the better way to implement the counting of lines in a file.

Neither. Ruby can do it easily using foreach:

def count_lines(file_name)
  lines = 0
  File.foreach(file_name) { lines += 1 }
  lines
end

If I run that against my ~/.bashrc:

$ ruby test.rb
37

foreach is very fast and will avoid scalability problems.

Alternately, you could take advantage of tools in the OS, such as wc -l which were written specifically for the task:

`wc -l .bashrc`.to_i

which will return 37 again. If the file is huge, wc will likely outrun doing it in Ruby because wc is written in compiled code.


You can also read in large chunks with read and count newline characters.

Yes, read will allow you to do that, but the scalability issue will remain. In my environment read or readlines can be a script killer because we often have to process files well into the tens of GB. There's plenty of RAM to hold the data, but the I/O suffers because of the overhead of slurping the data. "Why is "slurping" a file not a good practice?" goes into this.

An alternate way of reading in big chunks is to tell Ruby to read a set block size, count the line-ends in that block, looping until the file is read completely. I didn't test that method in the above linked answer, but in the past did similar things when I was writing in Perl and found that the difference didn't really improve things because it resulted in a bit more code. At that point, if all I was doing was counting lines, it'd make more sense to call wc -l and let it do the work as it'd be a lot faster for coding time and most likely in execution time.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 1
    You can also read in large chunks with `read` and count newline characters. – tadman Oct 07 '16 at 00:58
  • @ the Tin Man, thanks for the great answer that includes speed optimisation @tadman Cheers for the extra hint :) – gogofan Oct 07 '16 at 14:17
  • Regarding using `read` and counting newlines, see the added section in the answer. – the Tin Man Oct 07 '16 at 17:14
  • I meant using `read` with a fixed length limit, not read on the whole file. Using a smaller buffer, even 64KB, usually outperforms converting these blocks into individual strings. `wc` will be dramatically faster if available, but as always, *be very wary* when passing in arguments. Use the [open3 library](https://ruby-doc.org/stdlib-2.3.0/libdoc/open3/rdoc/Open3.html) and pass in arguments individually to avoid security problems. If not that, then use [`shellescape`](https://ruby-doc.org/stdlib-2.3.1/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape) on any arguments. – tadman Oct 08 '16 at 03:12