5

I recently replaced Google Analytics by the self-hosted analytics tool Piwik.

This means that every time someone connects my website http://www.mywebsite.com, a Javascript tracking code is executed on the client, that calls my Piwik server http://www.mywebsite.com/piwik/piwik.php

Result:

  1. on my server's Apache access.log, there is a line about http://www.mywebsite.com, that's normal
  2. in my Piwik database, an information is stored about this visit, this is normal
  3. on my server's Apache access.log, there is a line about the fact my Piwik server received a tracking request (executed by client with JS)

The logging part 3. is clearly too much! From now, since Piwik in installed, my access.log is double sized!

How to remove the fact that Apache logs in access.log the connection to http://www.mywebsite.com/piwik/piwik.php ? i.e. client JS tracking code <--> Piwik server ?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • You could just post-process your logfile with `grep`...? – larsks Oct 23 '16 at 18:31
  • 1
    I think it would be better to not log these requests instead of logging them and postprocessing the log to remove them. – Basj Oct 23 '16 at 18:34

3 Answers3

10

The solution is to disable logging of certain requests (for example in /etc/apache2/sites-available/000-default.conf with Debian 8):

<VirtualHost *:80>
  ServerName www.mywebsite.com
  DocumentRoot /home/www/mywebsite
  ...
  SetEnvIf Request_URI "^/piwik(.*)$" dontlog
  CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log vhost_combined env=!dontlog
</VirtualHost>
Basj
  • 41,386
  • 99
  • 383
  • 673
  • 2
    I was able to use this concept to stop logging of images, css, and js. `SetEnvIf Request_URI "^.*(\.jpg|\.png|\.gif|\.css|\.js|\.svg).*" dontlog` – ksaylor11 Apr 17 '20 at 19:40
  • 1
    This works perfect, In my case I needed any path ending with /path/... ignoring what comes after, for example I wanted not to log anything similar to the examples bellow: /before/path/?id=123 /path/?id=456 /before/path/ So the Simply need to remove the `^`, The full rule: `SetEnvIf Request_URI "/path(.*)$" dontlog` – Lior Gross Sep 10 '21 at 00:14
4

The Apache manual contains a section on conditional logging

https://httpd.apache.org/docs/2.4/logs.html

What you need to do is set an environment variable when a condition is met (path is piwik/piwik.php)? Then you can use that environment variable in the apache log file configuration.

Daniel Scott
  • 7,418
  • 5
  • 39
  • 58
0

Disabling your tracking logs in Apache log file is not the best idea. If your Piwik will crash for some reason or your tracking will not work for some period of time (eg. over the weekend) you will loose your data.

Apache logs can save you here, you can then replay your traffic using LogAnalytics: http://piwik.org/log-analytics/#logfile

It is better to have reasonable log file storing policy then removing data from your log.

Sebastian Piskorski
  • 4,026
  • 3
  • 23
  • 29
  • `If your Piwik will crash for some reason or your tracking will not work for some period of time (eg. over the weekend) you will loose your data.` No, you probably misread the question. I don't want to remove traditional apache `access.log` logging of my website (part 1 in my question). I only want to remove *the logging of the Piwik tracking packets* themselves (part 3), i.e. access to `http://www.mywebsite.com/piwik/piwik.php`. Of course I want to keep 1. and 2., but not 3. – Basj Oct 29 '16 at 13:17
  • I did understand your question. Tracking packets might be valuable data for Piwik and I would keep them as a backup. But this is your server. – Sebastian Piskorski Oct 29 '16 at 13:28
  • Ok sorry then, I thought you didn't read in detail ;) `Apache logs can save you here, you can then replay your traffic using LogAnalytics` => yes, but for this (i.e. `import_logs.py`) we use the part 1. in my question (i.e. traditional `access.log` logging of the website access itself). Part 3. is never used for that, is that right? In short, logging access to the website itself, ok, but logging access to a tracking packet of the website, this is redundant, in my opinion. But I am interested to know other people's policy about this. – Basj Oct 29 '16 at 13:36
  • Some requests might be redundant like the one for `piwik.js` resource, but all tracking request are valuable for `LogAnalytics`. All tracking requests contain data payload in URL `query`. You need it for event tracking. Also if you want `LogAnalytics` to be able to track the user through page it will need user id `_id` which is passed in seemingly redundant tracking request. – Sebastian Piskorski Oct 29 '16 at 14:46