33

First of all, I searched as best I could and read all SO questions that seem relevant, but nothing specifically answered this. This is not a duplicate, afaik.

Obviously if anonymous voting on a website is allowed, there is no fool proof way to prevent someone voting more than once.

However, I am wondering if someone with experience can aide me in coming up with a reasonably reliable way of tracking absolutely unique visitors and recording votes against those credentials.

Currently I am ensuring that only one vote per item/session combo is allowed, however this is easily circumvented by restarting browser, changing browsers/computers, or clearing your session data.

Recording against IP seems the next reasonable solution but I wonder if this will get false positives too often (multiple people on same LAN behind a NAT will have same external IP, etc).

Is there a middle ground to be had here or some other method/combination I am overlooking?

David Wheaton
  • 608
  • 6
  • 11
Bo Jeanes
  • 6,294
  • 4
  • 42
  • 39
  • 3
    Let me phrase it this way: You want to confirm that someone is a specific, unique individual? That's authentication. Authentication is the fundamental opposite of anonymity. – Tetsujin no Oni Jun 24 '09 at 13:22
  • 2
    i don't want to confirm, I just need a better "best guess" -- anonymous users are anonymous purely because I don't require them to create an account. I mean they are guests to the system, so without a user_id what is the next best piece of information to associate with their vote to get reasonably close to one vote per "user" – Bo Jeanes Jun 24 '09 at 23:11

8 Answers8

23

I'd collect as much data about the session as possible without asking any questions directly (browser, OS, installed plugins, all with versions numbers, IP address etc) and hash it.

Record the hash and increment a counter if you want multiple votes to be allowed. Include a timestamp (daily, hourly etc) in the salt to make votes time sensitive, say 5 votes per day.

mlambie
  • 7,467
  • 6
  • 34
  • 41
  • Hashing lots of user info is a brilliant idea! I don't want a time based salt though as currently the voting model is one up/down vote ever. – Bo Jeanes Jun 24 '09 at 22:50
  • I'm currently implementing this method. I can grab the broswer/OS (as part of the user-agent string) and the IP with php, but I can't get a list of plugins without using javascript, which a user could modify, which would change the hash value and defeat the purpose of this system. So I'm hoping IP + user-agent will be unique enough. – Jim Greenleaf Dec 30 '10 at 22:53
13

The simplest answer is to use a cookie. Obviously it's vulnerable to people clearing their cookies, but anonymous voting is inherently approximate anyway.

In practice, unless the topic being voted on is in some way controversial or inflammatory, people aren't going to have a motive behind rigging the vote anyway.

IP is more 'reliable' but will produce an unacceptably high level of collisions due to NATs.

How about a more unique identifier composed of IP + user-agent (maybe a hash)? That effectively means for each IP, each exact OS/browser version pair gets 1 vote, which is a lot closer to 1 vote per person. Most browsers provide detailed version information in the user-agent -- I'm not sure, but my gut feel is that this would prevent the majority of collisions caused by NATs.

The only place that would still produce lots of collisions is a corporate environment with a standardised network, where everyone is using an identical machine.

ben_h
  • 4,424
  • 1
  • 20
  • 11
  • 1
    Not a bad idea. Still will have some collisions due to same browsers (for instance every one on my network is using same version of Leopard and Safari) but would be much more reliable than IP alone. – Bo Jeanes Jun 24 '09 at 22:48
12

The Chinese have to share one IPv4 address with hundreds of others; Hp/Compaq/DEC has almost 50 million addresses. IPv6 doesn't help as everyone get addresses by the billion. A person just is not the same as an IP address, and that notion is becoming ever more false.

There are just no proper ways to do this on the Internet. Persons are simply a concept unknown on the Internet, and any idea to introduce the concept is unlikely to succeed. (Too many governments would not want this to happen, for instance.)

Of course, you can relate the amount of votes per IP to the amounf of repeat page visits from that IP, especially in combination with cookie tracking. This works best if you estimate that number before you start the voting period. If the top 5% popular articles are typically read 10 times from a single IP, it's likely 10 people share that IP and they should get 10 votes. Cookies can be used to prevent them from stealing each others vote, but on the whole they can't skew your poll. (Note: this fails in small communities where a large group of voters come from a small number of IPs, in particular this happens around universities).

MSalters
  • 173,980
  • 10
  • 155
  • 350
3

If you're not looking at authenticating voters, then you're going to be getting some duplicate votes no matter what you use. I'd use a cookie, and have done with it for the anonymous users.

UserVoice allows both anonymous voting and voting when logged in, but then allows the admin to filter out anonymous votes - a nice solution to this problem.

Codebeef
  • 43,508
  • 23
  • 86
  • 119
  • I am well aware that without authentication it's impossible to guarantee no duplicates, I am simply looking at a way to reduce them. – Bo Jeanes Jun 24 '09 at 22:49
3

Anything based on IP addresses isn't an option - the case of NAT has been mentioned, but this seems to only be in the case of home users. There are many larger installations that use NAT - some corporations can have thousands of users pooled behind a single IP address. There are also ISP's that use proxy servers for their users - another case where you can have many thousands of users appear to your application as a single address. Adding unique UA combinations to this won't help, as there isn't enough variation.

A persistent cookie is going to be your best bet - and you'll have to live with the fact that it is easy to game. At least when the cookie is persistent (as opposed to session based) you'll catch the majority of users who run a single browser.

If you really want to rely on the results, you are going to have to add some form of identification in the process (like e-mail validation, which is still gameable).

At the end of the day any internet survey is going to have flaws (like: http://www.time.com/time/arts/article/0,8599,1894028,00.html), and you'll have to live with this.

lstoll
  • 179
  • 4
  • Yup, I can live with duplicates, but reducing it is better. If people really want to game the system, they can and will. This is just a reduction measure – Bo Jeanes Jun 24 '09 at 22:52
3

Use a persistent cookie to allow only one vote per item

and record the IP, if there are more than 100 (1,000? 10,000?) requests in less than X mins then "soft block" the IP

The "soft block": dont show a page saying "your IP has been blocked" but show your "thank you for your vote" page and DONT record the vote in your DB. You even can increase the counter for that IP only. You want to prevent them to know that you are blocking their IP.

Victor
  • 23,172
  • 30
  • 86
  • 125
  • I was previously doing that, but something like curl doesn't use cookies so they could game the votes by just running curl in a while loop. I settled with user agent + IP combo. Nothing will ever be 100% without tying votes to some sort of user record, but this seems to have been working well enough. – Bo Jeanes Nov 25 '09 at 09:16
1

Two ideas not mentioned yet are:

  • Asking for the user's email address and emailing them a verification link
  • Using a captcha

Obviously the former can be circumvented with disposable email addresses and so on, but gives you an audit trail, and provides a significant hurdle to casual/bot vote-stuffing. A good captcha likewise severely limits vote-stuffing, but with all the usual caveats surrounding their use.

grahamparks
  • 16,130
  • 5
  • 49
  • 43
  • well users are anonymous not for anonymity's sake but for ease's sake. I don't want to have them fill in any forms for each vote or at all really – Bo Jeanes Jun 24 '09 at 23:09
0

I have the same problem, and here's what I am planning on doing...

Set a persistent cookie. Check the cookie to decide whether a particular vote could be cast. Additionally store some data about the vote request in the form of a combination of IP address + User Agent. And then use this value to limit the no. of votes to, say, 10 per day.

What is the best way of going about creating this hash (IP + UA String)?

  • 1
    You don't want to use a cookie. That's the easiest possible way to game the system. Curl and wget won't preserve the cookies so a simple while look on the command line will entirely annihilate any cookie-based system in minutes – Bo Jeanes Nov 04 '10 at 22:21
  • As I have mentioned above, am not relying solely on the cookie... am storing the IP address and User Agent string in the DB as well, and as of now am allowing only 5 votes per day from the same IP+UA combo. – paganwinter Nov 08 '10 at 07:17
  • Urgent attention needed: "Attacker adds 1 to the version of his browser in the UA every 5 requests and get infinite vote!" There is now way you can't secure that appropriately without using a "real" identity check, or at least something which would cost money to the attacker, like sending a unique code via SMS. – Shadok Mar 26 '12 at 15:01
  • why not require the cookie be present? – Brady Moritz Aug 23 '16 at 21:45