12

I have a gzip file and currently I read it like this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
output = gz.read
puts result

I think this converts the file to a string, but I would like to read it line by line.

What I want to accomplish is that the file has some warning messages with some garbage, I want to grep those warning messages and then write them to another file. But, some warning messages are repeated so I have to make sure that i only grep them once. Hence line by line reading would help me.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
infinitloop
  • 2,863
  • 7
  • 38
  • 55

3 Answers3

23

You should be able to simply loop over the gzip reader like you do with regular streams (according to the docs)

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
gz.each_line do |line|
  puts line
end
Tigraine
  • 23,358
  • 11
  • 65
  • 110
  • Does this automatically close the file after the reading is complete? – Rohit Aug 14 '15 at 20:49
  • 3
    Yes and no - if the GzipReader operates on a file directly you might want to close it. But in this case I assumed the `open` method opens the file and thus you have to close the `inline` IO stream. – Tigraine Aug 24 '15 at 08:21
  • 2
    Wow !! 4 years on and still replying to comments on ur answer. Now that is dedication !! Thanks again. – Rohit Aug 25 '15 at 04:47
  • @Tigraine Getting - list_failed_logins.rb:2: uninitialized constant Zlib (NameError) – Nameless Aug 02 '17 at 10:19
  • 1
    @AjayAradhya you might have to `require 'zlib'` in your file – Tigraine Aug 02 '17 at 13:48
  • is there a way to read a particular file inside gz? as you know it does not consist of one file but many files – Moses Liao GZ Jan 30 '20 at 09:02
  • gzip is only the compression format. If you have multiple files inside the gz file it's usually a tarball (so only one file) that then contains multiple files. How to then read the specific files inside the tarballs you'd have to check. Info on the tarball format: https://en.wikipedia.org/wiki/Tar_(computing) – Tigraine Jan 31 '20 at 14:31
1

Try this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
while output = gz.gets
  puts output
end
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • 1
    Using `while` works, but `each_line` as @Tigraine showed, is more idiomatic in Ruby. – the Tin Man Dec 30 '11 at 23:31
  • 2
    I know. I even thought of deleting my answer, but then decided to leave it, for completeness. – Sergio Tulentsev Dec 30 '11 at 23:34
  • 2
    That's a good reason. I periodically show alternate ways to accomplish something. And, that's the beauty of Ruby, we can write in styles that are closer to how we've learned in other languages, which helps it be more accessible and portable to us as programmers. That was in line with Matz's goal of it being transparent to the developer. – the Tin Man Dec 30 '11 at 23:41
1

Other answers show how to read the file line by line, but not how to only capture the errors once. Building on @Tigraine's answer:

require 'set'

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)

errors = Set.new
# or ...
# errors = [].to_set

gz.each_line do |line|
  errors << line if (line[/^Error:/])
  # or ...
  # errors << line if (line['Error:'])
end

puts errors

Set acts like Array, but is built using Hash, so it's like a Hash but we're only concerned with the keys, i.e. only unique values are stored. If you try to add duplicates they will be thrown away, leaving you with only the unique values. You could use an Array, and afterwards use uniq, on it, but a Set will manage it for you up-front.

>> require 'set'
=> true
>> errors = Set.new
=> #<Set: {}>
>> errors << 'a'
=> #<Set: {"a"}>
>> errors << 'b'
=> #<Set: {"a", "b"}>
>> errors << 'a'
=> #<Set: {"a", "b"}>
the Tin Man
  • 158,662
  • 42
  • 215
  • 303