2

Currently I am using file input plugin to go over my log archive but file input plugin is not the right solution for me because file input plugin inherently expects that file is stream of events and not as a static file. Now, this is causing a great deal of problem for me because my log archive has a 100,000 + log files and I logstash opens a handle on all these files which are never going to change.

I am facing following problems

1) Logstash fails with problem mentioned in SO
2) With those many open file handles log archival storage is getting very slow.

Does anybody know a way to let logstash know that treat files statically or once a file is processed do not keep file handle on it.

In logstash Jira bug, I was told to write my own plugin with some other suggestions which won't help me much.

Community
  • 1
  • 1
Rocky
  • 391
  • 7
  • 20
  • I don't think logstash really holds up all the file descriptors. What I observed is logstash just open and close those constantly. at least in version 1.4.2 – Jackson Tale Oct 30 '14 at 16:21
  • I'm certain that I saw logatash kept thousands of file handles open. Now, it might be the case that logstash might have kept them open for a while but considering number of files I had, it tried to open all of them at once and it reached OS' max file descriptor llimit. – Rocky Oct 31 '14 at 17:45
  • My max file descriptor limit is 5000. I use logstash to monitor all 15000 files in a folder. It didn't have problem, at least for 1.4.2. Also I used `lsof` to see how many files opened by logstash, it just showed some logstash files. Also I used strace to monitor, I can see logstash open and close files rapidly. I don't know what's really inside logstash implementation. I also hope what I observed is true. maybe you can just try set max open file descriptor limit to a high value and just run. – Jackson Tale Nov 01 '14 at 13:35
  • I have seen logstash 1.4.2 open up about 35k file handles on ubuntu. – mooreds Apr 29 '15 at 21:17
  • As my case was log archive where I had to keep all the logs, i wrote a quick plugin to deal with this case, it copies the file, reads it and then deletes copy of original file. It solved my problem – Rocky Apr 30 '15 at 23:34

2 Answers2

5

Logstash file input can process static file. You need to add this configuration

file {
     path => "/your/logs/path"
     start_position => "beginning"
}

After adding the start_position, logstash reads the file from the beginning. Please refer here for more information. Remember that

this option only modifies “first contact” situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.
Otherwise you have set your sincedb_path to /dev/null .

For the first question, I have answer in the comment. Please try to add the maximum file opened. For my suggestion, You can try to write a script copy the log file to the logstash monitor path and move it out constantly. You have to estimate the time that logstash process a log file.

R4PH43L
  • 2,122
  • 3
  • 18
  • 30
Ban-Chuan Lim
  • 7,840
  • 4
  • 35
  • 52
  • I am pretty much sure logstash does process static files, what I meant by static files is, logstash shouldn't hold handle on files it has processed once. Because it holds handle, I had to increase open file handle limit on my Linux box by 50,000. It is deteriorating performance of linux as well as storage. My question was more of -- Is there anyway to let logstash know that once the file processed / parsed and pushed to ES, forget about it. Do not leave file handles open, do not expect it is going to update. Or is there any custom plugin available which will do this job? – Rocky Jun 05 '14 at 11:09
  • 2
    No. So far logstash can't do what you need. As I say in answer at the end, you have to write a script to move out the file from the monitor directory. Otherwise you have to modify the file plugin code – Ban-Chuan Lim Jun 05 '14 at 14:18
1

look out for this also turn on -v and --debug for logstash

{:timestamp=>"2016-05-06T18:47:35.896000+0530",
 :message=>"_discover_file: /datafiles/server.log: 
**skipping because it was last modified more than 86400.0 seconds ago**",
 :level=>:debug, :file=>"filewatch/watch.rb", :line=>"330",
 :method=>"_discover_file"}

solution is to touch the file or change the ignore_older setting

Iwo Kucharski
  • 3,735
  • 3
  • 50
  • 66