96

I installed Logstash to parse apache files. It took me quite q while to get the settings right and I always tried on real logs. I noticed (as the documentation says) that logstash "remembers" where it was in a file. Now my setings are Ok and I would like Logstash to "forget". This seems harder than I though. I already did the following:

  • used: start_position => "beginning"

  • deleted the complete "data" folder from elastissearch (and stopped it first)

  • looked at which files where opened by logstash with lsof -p PID and deleted everything which was promising (in my case /tmp/jffi*.tmp)

Still Logstash does not forget and parse only "fresh" files in the folder where the logs are

Any ideas?

Brad
  • 159,648
  • 54
  • 349
  • 530
Christophe Claude
  • 1,071
  • 1
  • 8
  • 5

14 Answers14

142

By default logstash writes the position is last was on to a logfile which usually resides in $HOME/.sincedb. Logstash can be fooled into believing it never parsed the logfile by specifying /dev/null as sincedb_path.

Here the part of the documentation Input File.

Where to write the since database (keeps track of the current position of monitored log files). Defaults to the value of environment variable "$SINCEDB_PATH" or "$HOME/.sincedb".

Config Example

input {
    file {
        path => "/tmp/logfile_to_analyse"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
flazzarini
  • 7,791
  • 5
  • 33
  • 34
  • 32
    On windows you can use `sincedb_path => "NUL"` to get the same effect. Detailes [here](http://stackoverflow.com/questions/313111/dev-null-in-windows) – Chris Magnuson Jan 20 '16 at 13:19
  • 12
    If the files are quite old (more then 24h) very useful is to add option `ingnore_older => 0` so logstash will take them no matter of date. By default if the files are older then 24h will be ignored. – mtfk Mar 01 '16 at 06:32
  • 1
    @mtfk: Wow awesome find! Thanks for pointing out `ignore_older => 0` works in logstash! I've been jammed by the same problem as the questioner. Seems to be a non-obvious find! (googling "ignore_older" and "logstash" only bring up pages on filebeat, I couldn't find any trace of how to deal with this in logstash) – Mike Lutz Mar 03 '16 at 22:40
  • How to add this while using filebeat – Sunilkumar Ramamurthy Mar 13 '18 at 03:35
  • @SunilkumarRamamurthy I believe if you leave out the option ``ignore_older`` in your filebeat configuration, filbeat is forced to read the entire file again https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#ignore-older – flazzarini Mar 13 '18 at 14:03
21

The plugin file store history of "tailing" in sincedb file, default : under $HOME/.sincedb* , see http://logstash.net/docs/1.3.3/inputs/file#sincedb_path

The since db file contains line look like :

[inode] [major device number] [minor device number] [byte offset]

So, if you want to parse again a complete file, you need to :

  • delete sindedb files
  • OR only delete the corresponding line in sincedb file, check the inode number before of your file (ls -i yourFile | awk '{print $1}' )
  • And restart Logstash

With the key start_position => "beginning", Logstash will analyze all the file.

Example of a sincedb file :

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
yesnault
  • 890
  • 8
  • 10
  • 2
    Regarding `start_position => "beginning"`, the documentation says: >This option only modifieds "first contact" situations where a file is new and not seen before. If a file has already been seen before, this option has no effect. – Brad Jan 21 '15 at 22:02
11

Logstash will keep the record in $HOME/.sincedb_*. You can delete all the .sincedb and restart logstash, Logstash will reparse the file.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Ban-Chuan Lim
  • 7,840
  • 4
  • 35
  • 52
10

Combining all answers, guess this is the best way to parse files. I did the same for my testing.

input {
  file {
    path => "/tmp/access_log"
    start_position => beginning
    sincedb_path => "/dev/null"
    ignore_older => 0
  }
}

For a quick test, instead of ignore_older , you can also touch /tmp/access_log to change timestamp of the file.

vikas027
  • 5,282
  • 4
  • 39
  • 51
5

If you are using logstash-forwarder check your home for .logstash-forwarder file instead:

{
  "/var/log/messages": {
    "source": "/var/log/messages",
    "offset": 43715,
    "inode": 12967,
    "device": 51776
  }
}
elwarren
  • 327
  • 4
  • 8
3

After deleting $HOME/.sincedb_* it still wasn't ingesting data for me.

After trying a bunch of things I removed all but the main .conf file from /etc/logstash/conf.d and restarted Logstash, and everything worked. I can only assume there was something in one of the .conf files that logstash was silently hanging on.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Seth
  • 707
  • 1
  • 9
  • 20
  • As I recall, I later turned on some debugging flag and it told me why it was angry rather than silently hanging. I think it was looking for a version number in the data but sometimes the data did not have a number in it. The check to find out what the number was would crash if it wasn't a number, so I had to first test it was a number then ask what number was it. – Seth Oct 12 '16 at 16:19
1

Actually reparsing each time is very costly if the file has large data in it. So you need to be careful before doing this. If we want to force it to reparse again then set the parameter inside input block

sincedb_path => "/dev/null" 

This option will not be storing the .sincedb file and logstash will reparse each time. But if you want to reparse occasionaly not each time then what you can do is that delete manually the .sinceDb path which is created on parsing the file. Generally it is present in the home directory as a hidden file if you are not a root user otherwise in root directory. You can also set the sincedb_path to some other location to trace this file easily.

sincedb_path => "/home/shubham/sinceDB/productsSince.db"
1

If you want to avoid messing with the logstash options I've found that renaming or removing the existing log file and creating a new file from the old file contents will trick logstash into re-indexing.

GreensterRox
  • 6,432
  • 2
  • 27
  • 30
0

I found it in my home dir but after deleting it, logstash refused to re-pick the existing log files. The way I got it to work was to add

sincedb_path => "/opt/elk/sincedb/"  

to my file plugin. I think to reset each time, just change the path of sincedb_path

Joseph
  • 2,155
  • 6
  • 20
  • 32
0

if you use tar.gz install filebeat, you can delete this file, $FilebeatPath/data/registry/filebeat/data.json, and rerun the filebeat

L.T
  • 2,539
  • 3
  • 20
  • 30
0

Try by deleting /var/lib/logstash folder in your ENV

0

As seen on: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path

You can see that Logstash is going to save a sincedb file keeping track of which file it already has seen and processed till which line.

If you want to get rid of the existing sincedb file and you do not have defined the sincedb_path yourself you can find it in

<path.data>/plugins/inputs/file

By default <path.data> holds the value

LOGSTASH_HOME/data

By default LOGSTASH_HOME holds the value

/var/lib/logstash

It is best to define the sincedb_path if you want to have full control of it

YouryDW
  • 393
  • 1
  • 7
0

I would suggest:

sincedb_clean_after => 0
start_position => "beginning"
Martin Ma
  • 55
  • 1
  • 7
-1

logstash version 5 new directory is in

<path.data>/plugins/inputs/file

path.data definition is in logstash.yml

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
foo01
  • 41
  • 4