Study IO's readlines
and foreach
documentation, especially the sep=$/
parameter:
...where lines are separated by sep.
Allowing us to define the end of line lets us read files in very interesting ways, such as in chunks:
data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
block.lines.reject { |l| l.rstrip.empty? }.last
}
data
# => ["LOOKING FOR THIS STRING: open\n", "LOOKING FOR THIS STRING: open\n"]
I stuck your sample data into a file, opened it and told IO to use the occurrence of two adjacent line-ends to mark the end of a line. Then it's simple to split the text into individual lines, reject empty lines and select the last one in the block.
How would I take this though and iterate back through until I reach the IP address?
data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
chunk = block.lines.reject { |l| l.rstrip.empty? }
[
chunk.first,
chunk.last
]
}
data
# => [["Here's an IP: 192.168.1.1\n", "LOOKING FOR THIS STRING: open\n"],
# ["Here's an IP: 192.168.1.2\n", "LOOKING FOR THIS STRING: open\n"]]
I would like to capture from IP address to "open", but have to search for "open" and then go back to whereever the IP address is.
data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
block.lines.reject { |l| l.rstrip.empty? }
}
data
# => [["Here's an IP: 192.168.1.1\n",
# "here is some data\n",
# "Here is more dataa\n",
# "LOOKING FOR THIS STRING: open\n"],
# ["Here's an IP: 192.168.1.2\n",
# "here is some data\n",
# "Here is more dataa\n",
# "LOOKING FOR THIS STRING: open\n"]]
Be careful using readlines
when processing text files. It slurps the entire file into memory which is usually a bad idea, especially when dealing with any sort of log file as they can get REALLY big and consume all available space in RAM, which will take your machine to its knees. See "Why is "slurping" a file not a good practice?" for more information.
You could accomplish something similar using Enumerable's slice_*
methods, but letting IO handle it with the separator is more straightforward and should be faster:
If the double new-lines are sometimes missing then it's still easily done by removing the sep
value and letting one of the slice_*
methods handle the lifting, but note that afterwards there's still the trailing double new-lines, but figuring out how to strip those is left as an exercise for the reader.
I added an extra block that didn't have the separators:
Based on a file looking like:
Here's an IP: 192.168.1.1
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open
Here's an IP: 192.168.1.2
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open
Here's an IP: 192.168.1.2
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open
The code works like:
data = File.foreach(ENV['HOME'] + '/Desktop/test.txt')
.slice_after { |l| l[/open$/] }
.to_a
data
# => [["Here's an IP: 192.168.1.1\n",
# "here is some data\n",
# "Here is more dataa\n",
# "LOOKING FOR THIS STRING: open\n"],
# ["\n",
# "Here's an IP: 192.168.1.2\n",
# "here is some data\n",
# "Here is more dataa\n",
# "LOOKING FOR THIS STRING: open\n"],
# ["Here's an IP: 192.168.1.2\n",
# "here is some data\n",
# "Here is more dataa\n",
# "LOOKING FOR THIS STRING: open\n"]]