1

I want to write a class, it can find a target string in a txt file and output the line number and the position.

class ReadFile

    def find_string(filename, string)
        line_num = 0
        IO.readlines(filename).each do |line|
            line_num += 1
            if line.include?(string)
                puts line_num
                puts line.index(string)
            end
        end
    end

end

a= ReadFile.new
a.find_string('test.txt', "abc")

If the txt file is very large(1 GB, 10GB ...), the performance of this method is very poor.

Is there the better solution?

pangpang
  • 8,581
  • 11
  • 60
  • 96
  • 1
    `IO.readlines` attempts to read the entire file into memory. Perhaps try using `File.open(filename).each` as that streams the file. Large files are still likely to take some time just to iterate over the number of lines and it also depends on the length of the lines being loaded. – Marc Baumbach Aug 24 '14 at 05:07

2 Answers2

4

Use foreach to efficiently read a single line from the file at a time and with_index to track the line number (0-based):

IO.foreach(filename).with_index do |line, index|
  if found = line.index(string)
    puts "#{index+1}, #{found+1}"
    break  # skip this if you want to find more than 1 result
  end
end

See here for a good explanation of why readlines is giving you performance problems.

Community
  • 1
  • 1
PinnyM
  • 35,165
  • 3
  • 73
  • 81
1

This is a variant of @PinnyM's answer. It uses find, which I think is more descriptive than looping and breaking, but does the same thing. This does have a small penalty of having to determine the offset into the line where the string begins after the line is found.

line, index = IO.foreach(filename).with_index.find { |line,index|
                line.include?(string) }
if line
  puts "'#{string}' found in line #{index}, " +
         "beginning in column #{line.index(string)+1}"
else
  puts "'#{string}' not found"
end
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100