0

I have just found out about R, which seems to be the ideal tool to get statistics on webserver logfiles. I have looked into several libs like zoo and plyr, but none of them offer a straight forward solution to aggregate timestamped data.

Is there any R lib or a tutorial or a documentation which focuses on analyzing log file like data? Which emphasize on aggregating the time in slices?

Possible usecases:

  • average request time per day
  • average requests per session per day
  • get the slowest requests this week
  • ...
d135-1r43
  • 2,485
  • 1
  • 25
  • 33
  • Why does `zoo` and `plyr` not work for you? At the moment your question is rather vague, thus difficult to answer. Can you be more specific with your question? Perhaps post some sample data and show what you have tried so far? – Andrie Jul 24 '12 at 07:10
  • My question is general by its nature ;) I just want some kind of overview, maybe someone has already dived into analyzing logs with R. I have solved my problems "somehow", but not in an elegant way. – d135-1r43 Jul 24 '12 at 07:12
  • Why must you do this with R? Why not use a web server log analyzer program like [AWstats](http://awstats.sourceforge.net)? – Joshua Ulrich Jul 24 '12 at 11:15

1 Answers1

2

This kind of question of processing timestamped data is actually quite common. Because your question is vague, my answer is limited to some pointers. For an example of aggregating timeseries see (which btw are all answers of myself):

These answers all use the same strategy, combined with the plyr and ggplot2 package. This should get you started. Note that these are only answers of myself that I kind find in a couple of minutes. Probably there is much more to find, especially if you are looking for more specific questions.

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149