8

I am trying to create a way to navigate my log files and the main features I need are:

  1. search for strings inside log file (and returning line of occurrences).
  2. pagination from line x to line y.

Now I was checking Logstash and it was looking great for my first feature (searching), but not so much for the second one. I was under the idea that I could somehow index the file line number along with the log information of each record, but I can't seem to find a way.

Is there somehow a Logstash Filter to do this? or a Filebeat processor? I can't make it work.

I was thinking that maybe I could create a way for all my processes to log into a database with processed information, but that's also kind of impossible (or very difficult) because the Log Handler also doesn't know what's the current log line.

At the end what I could do is, for serving a way to paginate my log file (through a service) would be to actually open it, navigate to a specific line and show it in a service which is not very optimal, as the file could be very big, and I am already indexing it into Elasticsearch (with Logstash).

My current configuration is very simple:

Filebeat

filebeat.prospectors:
- type: log
  paths:
    - /path/of/logs/*.log
output.logstash:
  hosts: ["localhost:5044"]

Logstash

input {
    beats {
        port => "5044"
    }
}
output {
  elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

Right now for example I am getting an item like:

    {
      "beat": {
        "hostname": "my.local",
        "name": "my.local",
        "version": "6.2.2"
      },
      "@timestamp": "2018-02-26T04:25:16.832Z",
      "host": "my.local",
      "tags": [
        "beats_input_codec_plain_applied",
      ],
      "prospector": {
        "type": "log"
      },
      "@version": "1",
      "message": "2018-02-25 22:37:55 [mylibrary] INFO: this is an example log line",
      "source": "/path/of/logs/example.log",
      "offset": 1124
    }

If I could somehow include into that item a field like line_number: 1, would be great as I could use Elasticsearch filters to actually navigate through the whole logs.


If you guys have ideas for different ways to store my logs (and navigate) please also let me know

eLRuLL
  • 18,488
  • 9
  • 73
  • 99
  • There is a github discussion about a similar request: https://github.com/elastic/beats/issues/1037 – sammy Mar 01 '18 at 07:25
  • Same for Logstash: https://github.com/logstash-plugins/logstash-input-file/issues/7 – Andrei Stefan Mar 01 '18 at 21:28
  • I am not sure what you are looking for, though. You know that to a certain degree Kibana can show you the events in a file. It won't give you a page by page kind of list where you can actually click on a page number, but using time frames you could theoretically look at an entire file. – Andrei Stefan Mar 01 '18 at 21:55
  • Yeah I could query on a time range, but it would easier (and more reliable) to query on a line number range. – eLRuLL Mar 01 '18 at 21:56
  • 3
    At this point I am not sure this would work. Logstash, Beats, Kibana all have the idea of events over time and that's basically the way things are ordered. Line numbers are more of a text editor kind of functionality. – Andrei Stefan Mar 01 '18 at 22:02

3 Answers3

4

Are the log files generated by you? Or can you change the log structure? Then you can add a counter as a prefix and filter it out with logstash.

For example for

12345 2018-02-25 22:37:55 [mylibrary] INFO: this is an example log line

your filter must look like this:

filter {
   grok {
     match => {"message" => "%{INT:count} %{GREEDYDATA:message}"
     overwrite => ["message"]
   }
}

New field "count" will be created. You can then possibly use it for your purposes.

Sergej Schelle
  • 420
  • 3
  • 9
1

At this moment, I don't think there are any solutions here. Logstash, Beats, Kibana all have the idea of events over time and that's basically the way things are ordered. Line numbers are more of a text editor kind of functionality.

To a certain degree Kibana can show you the events in a file. It won't give you a page by page kind of list where you can actually click on a page number, but using time frames you could theoretically look at an entire file.

There are similar requests (enhancements) for Beats and Logstash.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
-1

First let me give what is probably the main reason why Filebeat doesn't already have a line number field. When Filebeat resumes reading a file (like after a restart) it does an fseek to resume from the last recorded offset. If it had to report the line numbers it would either need to store this state in its registry or re-read the file and count newlines up to the offset.

If you want to offer a service that allows you to paginate through the logs that are backed by Elasticsearch you can use the scroll API with a query for the file. You must sort the results by @timestamp and then by offset. Your service would use a scroll query to get the first page of results.

POST /filebeat-*/_search?scroll=1m
{
  "size": 10,
  "query": {
    "match": {
      "source": "/var/log/messages"
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    },
    {
      "offset": "asc"
    }
  ]
}

Then to get all future pages you use the scroll_id returned from the first query.

POST  /_search/scroll
{
    "scroll" : "1m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBwAAAAAAPXDOFk12OEYw="
}

This will give you all log data for a given file name even tracking it across rotations. If line numbers are critical you could produce them synthetically by counting events starting with the first event that has offset == 0, but I avoid this because it's very error prone especially if you ever add any filtering or multiline grouping.

A J
  • 2,508
  • 21
  • 26