If you need to read the log file online (i.e. as the messages come in), I suggest to examine ways to offer the messages via TCP instead (or in addition) to writing them into a file.
If the remote app uses a logging framework, then this is usually just a few lines in the configuration.
This will also reduce load on the remote host since it doesn't have to write any data to disk anymore. But that's usually only a problem when the remote process accesses the disk a lot to do it's work. If the remote process talks a lot with a database, this can be counterproductive since the log messages will compete with the DB queries for network resources.
On the positive side, this makes it easier to be sure you process each log message at most once (you might lose some if your local listener is restarted).
If that's not possible, run tail -f <logfile>
via ssh
(as Vicent suggested in the other answer). See this question for SSH libraries for Java if you don't want to use ProcessBuilder
.
When you read the files, the hard tasks is to make sure that you process each log message exactly once (i.e. you don't miss any and that you don't process them twice). Depending on how the log rotation works and how your remote process creates log files, you might lose a couple of messages when they are switched.
If you don't need online processing (i.e. seeing yesterdays messages is enough), try rsync
to copy the remote folder. rsync
is very good at avoiding duplicate transfers and it works over ssh
. That will give you a local copy of all log files which you can process. Of course, rsync
is too expensive to handle the active log file, so that's the file which you can't examine (hence the limitation that this is only possible if you don't need online processing).
One final tip: Try to avoid transmitting useless log messages. It's often possible to reduce the load many times by filtering the log files with a very simple script before your transfer it.