53

After reading an article about the subject from O'Reilly, I wanted to ask Stack Overflow for their thoughts on the matter.

Thomas Owens
  • 114,398
  • 98
  • 311
  • 431

6 Answers6

63

Write locally to disk, then batch insert to the database periodically (e.g. at log rollover time). Do that in a separate, low-priority process. More efficient and more robust...

(Make sure that your database log table contains a column for "which machine the log event came from" by the way - very handy!)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I like that idea better. And, if you needed the data in the DB faster, you could do it nightly, if your log rotations aren't nightly to begin with. – Thomas Owens Nov 14 '08 at 15:07
  • 15
    Wouldn't that mean the log info in the database is always kind of out of date, and thus would serve mainly as an archive instead of something you would use to diagnose recent problems? – MusiGenesis Nov 14 '08 at 15:24
  • In terms of diagnosing today's issues, you'll face a challenge like that no matter how you handle current logs (tail/grep/etc or Windows alternatives so as not to interfere with logging). Even for a problem like that, you might want to look at historical data to see if it happened before. – Dave DuPlantis Nov 14 '08 at 15:45
  • For a busy application, I'd probably try to rotate every 15 minutes or may every hour - but it also depends on the exact app. If you've got an app which doesn't log much and which is "quiet" overnight, that would be a good time to flush the logs. – Jon Skeet Nov 14 '08 at 15:49
  • 4
    Of course, you could try to get smart - if you logged anything "serious" the collector/inserter could flush to the database earlier :) – Jon Skeet Nov 14 '08 at 15:50
  • @Dave: I was thinking more that if your server logs immediately to the database and only logs locally if the db is unreachable, then the db is always up-to-date (for the most part). – MusiGenesis Nov 14 '08 at 16:00
  • 4
    Seems way to complicated – Piotr Perak May 08 '14 at 08:59
  • 2
    @Peri: It's far from complicated. Your suggestion of writing straight to the database is likely to have a very significant performance impact if you have a reasonable amount of logging. You really don't want to end up in a situation where you can't add more diagnostic information because each log call is going to involve a database round trip. – Jon Skeet May 08 '14 at 11:57
  • 1
    @JonSkeet: As I described in my answer on the bottom 'if it doesn't slow down your DB'. :) In my experience I never seen this as performance problem in my systems. And we used Castle.Windsor interceptors on all the layers of the system to do logging for us so you can imagine we had a lot of log data. But maybe the system wasn't that big as you mean. About 400 users working at the same time. I wonder how you parse log files and put them to database in order to do selects as I described in my answer. I guess it can be simple and hard depending on what format you use in your log files. – Piotr Perak May 08 '14 at 12:14
  • And isn't it faster to write to DB then to HD? Isn't db optimized to do writes as fast as possible (asynchronous probably) then any of logging frameworks write to disk? – Piotr Perak May 08 '14 at 12:17
  • @Peri: Yes, you said "if it doesn't slow down your DB" but I think it's very *likely* to slow down your DB in large scale systems. 400 users isn't a particularly big system. As for writes to your database being faster than writes to the local disk - if you don't have any constraints *and if you're local to the database* then maybe. I typically work on systems where all data is replicated and is always *at least* one network hop away, possibly not even in the same data center. As for parsing the log files - I'd probably make the log just a sequence of protocol buffers, so the parsing is easy... – Jon Skeet May 08 '14 at 12:25
  • @JonSkeet: One more question. Client calls that something critical doesn't work. What do you do? I do SELECT and know what happened. It doesn't matter on which of load balanced servers this error occurred. – Piotr Perak May 08 '14 at 12:36
  • @Peri: Exactly the same - the data still ends up in the database, it's just that I'm taking it out of the path of the actual request. This delays the time between the request and the log being available, but you can make that delay pretty small - upload the new logs every minute, if you want. You can tweak that to whatever extent you need to, balancing all kinds of different priorities. (Another option is to make each server able to server its temporary logs internally, and call out to all of those servers to get latest information of course.) – Jon Skeet May 08 '14 at 12:42
  • @JonSkeet: but if you upload logs to db every minute then aren't you doing what I am? I can imagine it could be even worse then my solution because every minute there will be a lot of write requests to db (write peaks). When you write directly to db load is balanced across this minute. As SO informs comments are not meant for discussion :) so I'd love to see some blog/article on this topic by you. Maybe that would change my mind. – Piotr Perak May 08 '14 at 12:52
  • 1
    @Peri: No, for two reasons: 1) it's out of the bottleneck of the request. You're not slowing down the *request* for the database access. 2) you end up with far fewer (but larger) database writes, which I'd *expect* the database to be able to handle far more efficiently than a huge number of small transactions. (It could be a separate database, too - one which is *just* for logs, with different performance characteristics.) I don't have the time to write a decent blog post on this - which I'd expect to include all kinds of test setups etc, I'm afraid. – Jon Skeet May 08 '14 at 12:55
  • @JonSkeet: 1) Logging to database can/should be asynchronous depending on system. Log.Info("message") doesn't necessary mean you wait for db write and handle user request. It doesn't slow down request at all. – Piotr Perak May 08 '14 at 12:59
  • 1
    @Peri: *If* you do it entirely asynchronously, then we are describing the same thing - the only difference being that if the application dies completely, my system is more likely to be able to have captured the last few moments. Note that your answer doesn't mention asynchrony in any way, and the *manner* in which it's asynchronous is important. Are you suggesting any form of batching to reduce the transaction count? What indication is there that logging is still working, etc? Oh, and one other important point: if the database connection goes down, how do you diagnose that? :) – Jon Skeet May 08 '14 at 13:02
  • @JonSkeet: I'm only suggesting that you can do logs to db asynchronously (if there is a need) any way you choose to. For one example you can go to 'Ultra Fast ASP.NET' by Richard Kiessig. It shows exactly this, logging to db in background thread so you don't block request. As to what happens when db dies - then we logged to file. I don't work there anymore but so I can't check it but I think that NLOG had it already built in to log to disk in case of error in logging to db. So it was only a configuration setting. No custom code. – Piotr Perak May 08 '14 at 13:16
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/52307/discussion-between-jon-skeet-and-peri) – Jon Skeet May 08 '14 at 13:21
  • 3
    Sometimes, comments provide so much insight into the subject! – Swapnil Aug 26 '14 at 06:58
10

I'd say no, given that a fairly large percentage of server errors involve problems communicating with databases. If the database were on another machine, network connectivity would be another source of errors that couldn't be logged.

If I were to log server errors to a database, it would be critical to have a backup logger that wrote locally (to an event log or file or something) in case the database was unreachable.

MusiGenesis
  • 74,184
  • 40
  • 190
  • 334
  • That is a good point. Sure, you can check to see if everything is good before logging, but what if something happens after the check? Or what do you do with the data to log after the check? Two things not addressed in the article, at least from what I saw. – Thomas Owens Nov 14 '08 at 15:07
4

Log to DB if you can and it doesn't slow down your DB :)

It's way way way faster to find anything in DB then in log files. Especially if you think ahead what you will need. Logging in db let's you query log table like this:

select * from logs
where log_name = 'wcf' and log_level = 'error'

then after you find error you can see the whole path that lead to this error

select * from logs
where contextId = 'what you get from previous select' order by timestamp

How will you get this info if you log it in text files?

Edit: As JonSkeet suggested this answer would be better if I stated that one should consider making logging to db asynchronous. So I state it :) I just didn't need it. For example how to do it you can check "Ultra Fast ASP.NET" by Richard Kiessig.

Piotr Perak
  • 10,718
  • 9
  • 49
  • 86
3

If the database is production database, this is a horrible idea. You will have issues with backups, replication, recovery. Like more storage for DB itself, replicas, if any, and backups. More time to setup and restore replication, more time to verify backups, more time to recover DB from backups.

2

Think about a properly setup database that utilizes RAM for reads and writes? This is so much faster than writing to disk and would not present the disk IO bottleneck you see when serving large numbers of clients that occurs when threads start locking down due to the OS telling them to wait on currently executing threads that are using all available IO handles.

I don't have any benchmarks to prove this data, although my latest application is rolling with database logging. This will have a failsafe as mentioned in one of these responses. If the database connection can not be created, create local database (h2 maybe?) and write to that. Then we can have a periodic check of database connectivity that will re-establish the connection, dump the local database, and push it to the remote database.

This could be done during off hours if you do not have a H-A site.

Sometime in the future I hope to develop benchmarks to demonstrate my point.

Good Luck!

Andrew Carr
  • 777
  • 6
  • 16
2

It probably isn't a bad idea if you want the logs in a database but I would say not to follow the article's advice if you have a lot of log file entries. The main issue is that I've seen file systems have issues keeping up with logs coming from a busy site let alone a database. If you really want to do this I would look at loading the log files into the database after they are first written to disk.

carson
  • 5,751
  • 3
  • 24
  • 25