31

To respect the privacy of my users I'm trying to anonymize their IP addresses in nginx log files.

One way to do this would be defining a custom log format, like so:

log_format noip '127.0.0.1 - [$time_local]  '
    '"$request" $status $body_bytes_sent '
    '"$http_referer" "$http_user_agent" $request_time';

This method has two downsides: I can't distinguish between two users and can't use geo location tools.

The best thing would be to 'shorten' the IP address (87.12.23.55 would become 87.12.23.1).

Is there a possibility to achieve this using nginx config scripting?

Dunedan
  • 7,848
  • 6
  • 42
  • 52
endzeit
  • 685
  • 1
  • 6
  • 15
  • Related (apache): http://serverfault.com/q/343031/75968 – cweiske Dec 23 '11 at 10:42
  • 1
    There is a new nginx article about how to use nginScript for exactly this purpose: https://www.nginx.com/blog/data-masking-user-privacy-nginscript/ – hyperknot May 26 '18 at 09:33
  • I used to believe this was necessary for GDPR reasons, but logs fall under legitimate interest (in this case for cybersecurity reasons). – Martin Braun Apr 11 '23 at 22:18

4 Answers4

43

Even if there is already an accepted answer, the solution seems not to be valid.

nginx has the log_format directive, which has a context of http. This means, the log_format can only be (valid) set within the http {} section of the config file, NOT within the server sections!

On the other hand we have an if directive, which has a context of server and location.

So we can NOT use “if” and “log_format” within a server section (which is done within the accepted solution)

So the if is not helpful here, also if is evil ( http://wiki.nginx.org/IfIsEvil )! We need something which is working at http context because only there the log_format can be defined in a valid way, and this is the only place outside of the server context, where our virtual hosts are defined…

Luckily there is a map feature within nginx! map is remapping some values into new values (accessible within variables which can be used in a log_format directive). And the good message: This also works with regular expressions.

So let’s map our IPv4 and IPv6 addresses into anonymized addresses. This has to be done in 3 steps, since map can not accumulate returned values, it can only return strings or variables, not a combination of both.

So, at first we grab the part of IP we want to have in the logfiles, the second map returns the part which symbolizes the anonymized part, and the 3rd map rule maps them together again.

Here are the rules which go into the http {} context:

map $remote_addr $ip_anonym1 {
 default 0.0.0;
 "~(?P<ip>(\d+)\.(\d+)\.(\d+))\.\d+" $ip;
 "~(?P<ip>[^:]+:[^:]+):" $ip;
}

map $remote_addr $ip_anonym2 {
 default .0;
 "~(?P<ip>(\d+)\.(\d+)\.(\d+))\.\d+" .0;
 "~(?P<ip>[^:]+:[^:]+):" ::;
}

map $ip_anonym1$ip_anonym2 $ip_anonymized {
 default 0.0.0.0;
 "~(?P<ip>.*)" $ip;
}

log_format anonymized '$ip_anonymized - $remote_user [$time_local] ' 
   '"$request" $status $body_bytes_sent ' 
   '"$http_referer" "$http_user_agent"';

access_log /var/log/nginx/access.log anonymized;

After adding this to your nginx.conf config file, remember to reload your nginx. Your log files should now contain anoymized IP addresses, if you are using the “anonymized” log format (this is the format parameter of access_log directive).

Ry-
  • 218,210
  • 55
  • 464
  • 476
Mike Bretz
  • 1,956
  • 18
  • 19
  • 1
    Thank you! I've changed the accepted answer to yours. – endzeit Jan 03 '15 at 23:57
  • Is it possible to anonymize the error log with a custom log_format? Or can only the access log? – tschale May 10 '17 at 09:43
  • To answer my question, according to the comments in [this answer](http://stackoverflow.com/a/4282642/3344078), it is not possible – tschale May 10 '17 at 10:11
  • Hi, is it possible to make it with last octet divided by 2, floored and multiplyed back by 2? https://stackoverflow.com/questions/48259409/gdpr-anonymize-ip-in-nginx-last-octet-2-0 – Adam Jan 15 '18 at 09:05
  • Nice solution. Although it fails on some cases using compressed format e.g. `1::` , `1:2:3:4:5:6:7::` , `::8` , '1::8` . – user Apr 27 '18 at 18:07
  • 1
    Thanks for the useful tip. I have an issue though: I added the lines in the http {} section, then for each individual service I have a dedicated log in the server {} section. If I do add the access_log line there it says: nginx: [emerg] invalid log level "anonymized" in /etc/nginx/conf.d/x.conf:11 what am I doing wrong? – BxlSofty May 04 '18 at 11:42
  • @BxlSofty `invalid log level "anonymized"` means that you try to use the `anonymized` parameter for `error_log` and not just for `access_log`. `error_log` doesn't work like `access_log` and has other parameters. – Karsten Mar 22 '22 at 18:11
18

The accepted answer seems a bit bloated. Since nginx version 1.11 it's possible to do it this way:

map $remote_addr $remote_addr_anon {
    ~(?P<ip>\d+\.\d+\.\d+)\.    $ip.0;
    ~(?P<ip>[^:]+:[^:]+):       $ip::;
    default                     0.0.0.0;
}
  • 3
    I'm getting `nginx: [emerg] unknown "ip.0" variable` – Snowman Apr 12 '18 at 14:52
  • 1
    Request : Can you update the regex to handle cases like `9::` , `::9` , `1:2:3:4:5:6:7::`, `1::8` – user Apr 28 '18 at 04:36
  • "current version of nginx" is as vague as you can be. I have the same issue as @Snowman. – tolgap May 14 '18 at 10:05
  • @tolgap Unfortunately neither of you still shared your nginx versions. The map functionality required for my solution to work was added in nginx 1.11. I'll update my answer. At the time of this writing releases tagged 1.12 and below are listed under "Legacy versions" (http://nginx.org/en/download.html). – Michael Gorianskyi May 15 '18 at 10:59
  • @Medorator I'm not very familiar with ipv6 structure, unfortunately :-( – Michael Gorianskyi May 15 '18 at 11:10
  • 1
    @Medorator try `^(?P[^:]+(?::[^:]+)?):` for all except ::9 – DSchmidt May 21 '18 at 11:34
2

Here's an nginx module that basically does this (anonymizing IP addresses in your logs): ​https://github.com/masonicboom/ipscrub. It generates a hash of the IP address as $remote_addr_ipscrub. The hash salt cycles every so often (configurable), so you can link requests without logging user IP addresses.

XZVASFD
  • 1,040
  • 1
  • 11
  • 12
0

I think, a good and practicable solution is to anonymize the IP before rotating your log files (which you should do daily). There are lot's of scripts for this task available for the Apache, and since the log format is at least very similar, they should work out of the box or be easily adjustable. Of course, you still store the full IP for 24 hours or less, but that's better than having them laying around for years.

iGEL
  • 16,540
  • 11
  • 60
  • 74
  • 5
    ^^ no, it is not, you may store IPs, if you have a special interest (security measures) or if you have the consent of the user. – Jingo May 22 '18 at 15:18