Upon close examination I found that a part of one line is wrapped to form the next line. How to get the correct count?
You need to show an example of the incoming data if you want us to help beyond generic answers.
To fix the problem, you have to be able to identify the line. We can't help you there because it could look like anything. Making a wild guess, I'd say that one of the columns had an embedded new-line in it, which forces the line to wrap.
It the file is a true CSV file, that column should be wrapped in double-quotes, so you could search the file for lines that do NOT end with whatever data type should be in the last column, then read the next line, join them, then rewrite the file. But, again, we have nothing to work with, because your file's format could be a huge number of different things.
Your best bet is to use the CSV class that comes with Ruby, and let it read the file, instead of trying to treat it like a text file. CSV files are text, but they are formatted to maintain the columns and rows, so using the CSV class will give you a better chance of getting at the data.
Looking at your code:
There are a number of ways to count the number of lines in a file, including the easiest which is:
`wc -l /path/to/file`.to_i
if you're using *nix.
Using File.open(file).readlines.to_a
is horribly redundant and not fast or scalable if your file is big.
readlines
returns an array.
to_a
returns an array.
Why turn the array into an array?
readlines
loads an entire file into memory, then splits it on line ends into an array. That process can be a lot slower than simply reading the file line-by-line and incrementing a counter, plus "slurping" can make your program crawl if the file is larger than available memory.
See "Why is "slurping" a file not a good practice?" for more information.
compact
removes nils from an array. readlines
should never return any nils so compact
will iterate over the array looking for something that shouldn't exist.
count
returns an integer.
to_i
converts the receiver to an integer.
In other words, to_i
is turning an integer into an integer. Why?
If you want to do it in Ruby instead of using wc -l
, do something simple and fast:
lines_in_file = 0
File.foreach(some_file) { lines_in_file += 1 }
After running that, lines_in_file
will contain the number of lines read. Memory won't be impacted and it'll run like blue blazes on huge files.