Trying to grab data from a log file that contains multiple strings

Question

I have a log file with the sample file contents

Here's an IP: 192.168.1.1
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open

Here's an IP: 192.168.1.2
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open

I am able to extract the IP address using a regex of data[/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/]; however, I am trying to figure out a way to look for the word "open" (for example) and go reverse through the logs to find out which IP address it matches with. I am trying to also accomplish doing this on multiple instances.

For example, that output is from the data variable.

One option I thought of was trying to split the data by newline, creating an array, and then going through and finding all of the items that contain "open", grab their index, and work my way back until i find an IP address. I'm not sure this is a feasible method.

In some cases, two newlines (\n\n) does not separate the data so I can't use that as a "delimiter" so to speak. I need to specifically be able to iterate back up through the log files until I can find an IP address.

Is there any better way to accomplish this by chance?

You're changing your criteria after the fact. Don't do that; It makes for very poor questions and if done habitually will irritate those helping you. If the sample input is not correct please fix it. Also, don't add "edited" or "updated" tags to your text. We can see what's changed. Add new text as if it was there all along. — the Tin Man, Dec 26 '19 at 22:41
Suppose you create a method `m`, called `m(file_name)`. 1. Please show the Ruby object that you want `m` to return when the file `file_name` contains the text shown in your example. (More generally, whenever you give an example show the desired Ruby object that is to be returned.) 2. You said, "In some cases, two newlines do not separate the data...". In what other ways might the blocks of data be separated? 3. Can the entire file be read into an array (using `readlines`, say) or is the file so large that it must be read line-by-line (using `foreach`, say)? Please answer by editing — Cary Swoveland, Dec 26 '19 at 23:48
@LewlSauce : Text files are not meant to be read backwards. I suggest you read the file forward, keeping the lines read so far in an array. Once you encounter a line with your keyword _open_, go backwards through the array to collect your IP address. If there is none, throw away the array collected so far, and continue reading with a fresh array. — user1934428, Dec 27 '19 at 08:12

the Tin Man · Accepted Answer · 2019-12-26T22:48:28.747

Study IO's readlines and foreach documentation, especially the sep=$/ parameter:

...where lines are separated by sep.

Allowing us to define the end of line lets us read files in very interesting ways, such as in chunks:

data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
  block.lines.reject { |l| l.rstrip.empty? }.last
}

data
# => ["LOOKING FOR THIS STRING: open\n", "LOOKING FOR THIS STRING: open\n"]

I stuck your sample data into a file, opened it and told IO to use the occurrence of two adjacent line-ends to mark the end of a line. Then it's simple to split the text into individual lines, reject empty lines and select the last one in the block.

How would I take this though and iterate back through until I reach the IP address?

data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
  chunk = block.lines.reject { |l| l.rstrip.empty? }
  [
    chunk.first, 
    chunk.last
  ]
}

data
# => [["Here's an IP: 192.168.1.1\n", "LOOKING FOR THIS STRING: open\n"],
#     ["Here's an IP: 192.168.1.2\n", "LOOKING FOR THIS STRING: open\n"]]

I would like to capture from IP address to "open", but have to search for "open" and then go back to whereever the IP address is.

data = File.foreach(ENV['HOME'] + '/Desktop/test.txt', "\n\n").map { |block|
  block.lines.reject { |l| l.rstrip.empty? }
}

data
# => [["Here's an IP: 192.168.1.1\n",
#      "here is some data\n",
#      "Here is more dataa\n",
#      "LOOKING FOR THIS STRING: open\n"],
#     ["Here's an IP: 192.168.1.2\n",
#      "here is some data\n",
#      "Here is more dataa\n",
#      "LOOKING FOR THIS STRING: open\n"]]

Be careful using readlines when processing text files. It slurps the entire file into memory which is usually a bad idea, especially when dealing with any sort of log file as they can get REALLY big and consume all available space in RAM, which will take your machine to its knees. See "Why is "slurping" a file not a good practice?" for more information.

You could accomplish something similar using Enumerable's slice_* methods, but letting IO handle it with the separator is more straightforward and should be faster:

If the double new-lines are sometimes missing then it's still easily done by removing the sep value and letting one of the slice_* methods handle the lifting, but note that afterwards there's still the trailing double new-lines, but figuring out how to strip those is left as an exercise for the reader.

I added an extra block that didn't have the separators:

Based on a file looking like:

Here's an IP: 192.168.1.1
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open

Here's an IP: 192.168.1.2
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open
Here's an IP: 192.168.1.2
here is some data
Here is more dataa
LOOKING FOR THIS STRING: open

The code works like:

data = File.foreach(ENV['HOME'] + '/Desktop/test.txt')
  .slice_after { |l| l[/open$/] }
  .to_a

data
# => [["Here's an IP: 192.168.1.1\n",
#      "here is some data\n",
#      "Here is more dataa\n",
#      "LOOKING FOR THIS STRING: open\n"],
#     ["\n",
#      "Here's an IP: 192.168.1.2\n",
#      "here is some data\n",
#      "Here is more dataa\n",
#      "LOOKING FOR THIS STRING: open\n"],
#     ["Here's an IP: 192.168.1.2\n",
#      "here is some data\n",
#      "Here is more dataa\n",
#      "LOOKING FOR THIS STRING: open\n"]]

Thanks for the reply. How would I take this though and iterate back through until I reach the IP address? I would like to capture from IP address to "open", but have to search for "open" and then go back to whereever the IP address is. — LewlSauce, Dec 26 '19 at 22:05
You don't have to iterate back, the IP is captured in the block. I'll add a bit more code showing what I'd do. — the Tin Man, Dec 26 '19 at 22:07
Gotcha. Ok I see where we're going. So essentially this would just break up the results based on two consecutive newlines, but in some cases I don't have two newlines separating the results, which is why I needed to go back up through the logs until I find the first instance of an IP address. Sorry that I didn't make that clear. — LewlSauce, Dec 26 '19 at 22:14
Notice that in the last example it's grabbing the entire block and returning it as an array. It'd be easy to rebuild the string of _join_ed (hint, hint) lines if that's your desire. — the Tin Man, Dec 26 '19 at 22:23
I added a final example that handles missing double-separators. — the Tin Man, Dec 26 '19 at 22:47

Trying to grab data from a log file that contains multiple strings

1 Answers1