0

I'm trying to create a php counter, and in order not to count repeated visits from the same visitor, I've been thinking about saving the visitor's IP address into the database, and I should turn to $_SERVER

I've read this sample funtion by @Dusza that seems nice and convenient:

<?php
function get_IP() {

// ADDRESS IP
   if     (getenv('HTTP_CLIENT_IP'))       $ipaddress = getenv('HTTP_CLIENT_IP');
   else if(getenv('HTTP_X_FORWARDED_FOR')) $ipaddress = getenv('HTTP_X_FORWARDED_FOR');
   else if(getenv('HTTP_X_FORWARDED'))     $ipaddress = getenv('HTTP_X_FORWARDED');
   else if(getenv('HTTP_FORWARDED_FOR'))   $ipaddress = getenv('HTTP_FORWARDED_FOR');
   else if(getenv('HTTP_FORWARDED'))       $ipaddress = getenv('HTTP_FORWARDED');
   else if(getenv('REMOTE_ADDR'))          $ipaddress = getenv('REMOTE_ADDR');
   else                                    $ipaddress = 'UNKNOWN';
//return $ipaddress;
    }
?>

But I've done some research here, and found that there's a security hole in that because the user can spoof all values except REMOTE_ADDR, which can be modified by a proxy.

So I guess that when they say that there's a security hole, it means that I should sanitize the user's input when I insert it into the database doing some bindings.

Is there any other precaution?

Given that all other values are unreliable I should avoid using them altogether?

But what about the un-spoffing value of REMOTE_ADDR? That can be modified by a proxy.

Any suggestions on what path should I take?

If you want to downvote, or vote the question to be closed or deleted, please leave me a comment about why, so I can improve my questions. Thanks.

Rosamunda
  • 14,620
  • 10
  • 40
  • 70
  • `REMOTE_ADDR` is where the request is being sent back to. The other addresses can be forged. You should be binding all values being written to the DB anyway. – user3783243 May 22 '18 at 12:59
  • "Given that all other values are unreliable I should avoid using them altogether?" — That's a matter of opinion and really depends on how reliable you need the data to be in the first place. – Quentin May 22 '18 at 12:59
  • 1
    `HTTP_X_FORWARDED_FOR` should only be trusted if it comes from a trusted source, like a load balancer or other proxy **in your control**. – ceejayoz May 22 '18 at 13:03
  • We use an AWS load balancer so **if the `REMOTE_ADDR` is the IP of that load balancer** then `HTTP_X_FORWARDED_*` can be trusted. (That's what we do anyway) – apokryfos May 22 '18 at 13:06
  • By the way IP based visitor counter is a terrible idea for other reasons – apokryfos May 22 '18 at 13:09
  • @apokryfos what other method would you use? – Rosamunda May 22 '18 at 13:10
  • 1
    Google analytics which in turn uses tracking cookies. Or roll your own kind of tracking cookies – apokryfos May 22 '18 at 13:11
  • That sounds like a very good idea, actually. :) But why using an IP based visitor counter is *terrible*? – Rosamunda May 22 '18 at 13:13
  • 1
    Main concern with IP is that if I'm on a train and visit your page on my phone then each time I hope cell area I get a new IP. A single session can result in 4-5 different IP visits. Now of course how much this affects you depends on how much of your traffic will be mobile network users on the move or people hopping from wifi hotspot to wifi hotspot, but nowadays I think it's a big enough concern to warrant an alternative. – apokryfos May 22 '18 at 13:15
  • Thank you for the explanation :) And you're right, that's an actual concern. – Rosamunda May 22 '18 at 13:25
  • 1
    In a nutshell: IPs aren't as useful as many developers think. IPs are an implementation detail of a data transport mechanism. They don't identify users, machines or people and have no guarantee to be unique nor stable. They're useful up to the level of DoS/firewall/fail2ban protection, they're rarely useful at the application layer. – deceze May 22 '18 at 13:42

1 Answers1

4

REMOTE_ADDR is the IP address established through a 3-way TCP/IP handshake. It is the IP the response will be sent back to. It is the only thing that your server has verified. Everything else is just arbitrary HTTP headers anyone could set.

Now, if you know that your server is running behind a proxy (e.g. a load balancer) which would mask the visitor's IP address (your server would only see the proxy's IP), but you know that the proxy is helpfully forwarding you the visitor's IP in an HTTP header (as workaround for this situation so your server can still see the visitor's IP), then and only then may you use one of these HTTP headers and only the one that you know your proxy is setting. If your server is not behind a proxy, use REMOTE_ADDR exclusively. Otherwise, consult your proxy's manual and implement according to the situation.

network diagram

deceze
  • 510,633
  • 85
  • 743
  • 889
  • "but you know that the proxy is helpfully forwarding you the visitor's IP in an HTTP header" Only when the proxy owner has configured the proxy to do so.. There are anonymous proxies that wont expose the real IP off the visitor. – Raymond Nijland May 22 '18 at 13:37
  • Yes, the emphasis is on ***you know***. – deceze May 22 '18 at 13:38