3

I am trying to find the most efficient way to process lines in a Ruby string in reverse order. These are the two approaches I have:

def double_reverse(lines)
    lines.reverse!
    lines.each_line do |line|
        line.chomp!
        line.reverse!
        puts line
    end
end

def split_and_reverse(lines)
    lines.split("\n").reverse.each do |line|
        puts line
    end
end

if __FILE__ == $0
    lines = "This is the first line.\nThis is the second line"
    double_reverse(lines)
    lines = "This is the first line.\nThis is the second line"
    split_and_reverse(lines)
end

I am wondering which one will use less memory. Is there any other approach which will use even less resource? I am primarily concerned about memory usage but if I can reduce the CPU usage too that would be nice.

EDIT 1:

In my use case lines can have more than a million lines. If split is going to increase the memory usage by 2x then it is definitely a problem for me. But it may not be a problem if the Ruby VM is smart enough to determine that lines won't be used after the call to split and releases it's memory. On the other hand the in-place reverse! approach theoretically seems to be more efficient since it can be done without making any copy of lines.

russoue
  • 5,180
  • 5
  • 27
  • 29
  • You can test the CPU usage using [Benchmark](http://www.ruby-doc.org/stdlib-2.1.5/libdoc/benchmark/rdoc/index.html) or [Fruity](https://github.com/marcandre/fruity). I'd recommend Fruity for ease of use. – the Tin Man Dec 04 '14 at 00:12
  • Are you reversing the strings, or an array of strings? There's a difference. The first reverses the string then splits it, so all the component strings are reversed. The second splits the string into an array, then processes the array backwards, so you're asking about, and comparing apples and oranges. Also, the first is doing more things, so usually it's going to take longer. – the Tin Man Dec 04 '14 at 00:15
  • @theTinMan you are right about the first one that it reverses the strings. That's why I have a reverse inside the loop to get the strings back in their original order. – russoue Dec 04 '14 at 01:18
  • If you have big files, then you want to avoid using `read` or `readlines` or any other method that attempts to slurp the data. Instead, use `foreach` or `each_line` to iterate over the file on disk. It's actually faster to use line-by-line I/O once a file gets to a reasonably small size than it is to read the entire file and split it in memory. Plus, using line-by-line is extremely efficient memory-wise, so scalability is not a problem. See "[Why is slurping a file bad?](http://stackoverflow.com/questions/25189262/why-is-slurping-a-file-bad/25189286#25189286)". – the Tin Man Dec 04 '14 at 04:24

1 Answers1

8

Try using Array#reverse_each:

lines.split("\n").reverse_each do |line|
    puts line
end

Alternatively, if conserving memory is your top priority, then here's a way using String#rindex with which one can be fairly certain is not doing any extra significant memory allocations beyond the original string:

j = lines.length-1 # lines is the full string, not an array

while -1 <= j
  i = lines.rindex("\n", j) || -1
  line = lines[i+1..j]
  puts line
  j = i-1
end
Matt
  • 20,108
  • 1
  • 57
  • 70
  • The [`Array` version of `reverse_each`](http://ruby-doc.org/core-2.1.5/Array.html#method-i-reverse_each) doesn’t create a new array, and since you get an array from `split` this should actually be better than creating the intermediate array. – matt Dec 04 '14 at 00:43
  • If I use `split` am I going to use more memory than the in-place reverse approach? – russoue Dec 04 '14 at 01:44
  • 1
    @russoue Added a way using String#rindex which shouldn't take any significantly extra memory. – Matt Dec 04 '14 at 02:32
  • 1
    Come on, @Matt, you made that up. `reverse_each`! Ha! Oh, wait, it's actually there. Why have I never seen it? Sorry for doubting you. – Cary Swoveland Dec 04 '14 at 07:19