5

Suppose you are writing a survey application and would like to somehow guarantee results to be secure from user stand point. Put simply, i know what IP you came from but i want to make sure you sleep well at night knowing i know nothing of your responses. I can't store IP in raw form.

I do need to guarantee 1 thing though, that is that you answer questions once. So once your PC comes in with some data, i need to recognize that your PC already has responsed to the survey.

Any suggestions on how to best handle it?

Thanks -mac

James Raitsev
  • 92,517
  • 154
  • 335
  • 470
  • Great question - this is why I do not trust "anonymous surveys" at work. Also consider what would happen if one were to use a bunch of proxy servers (unlikely). I suspect that you cannot guarantee either one of the two :) – Hamish Grubijan Aug 25 '10 at 01:50
  • Right. Boss wants to do a survey and sincerely does not want to know who said what. I am curious, but hey, bossman said, no one can know .. Any approaches you have experienced would be nice – James Raitsev Aug 25 '10 at 01:58
  • 6
    You actually have two problems, dealing with the data securely and convincing somebody that you are dealing with the data securely :) – Arnold Spence Aug 25 '10 at 01:58
  • What I do know is that monkey survey pro can collect IP addresses while the regular one "supposedly" cannot. I might recommend an old-school method - send out an email with blank word doc and have respondents type stuff out and drop papers in a box. It can then be re-scanned. Maybe most people are not as paranoid as me. They can find out who is who anyhow :) by linguistic signatures and content. – Hamish Grubijan Aug 25 '10 at 02:02
  • Yeah, that's a bit too much. I want to do simple web app content of which is not obvious to take advantage of – James Raitsev Aug 25 '10 at 02:04
  • I agree with Arnold Spence about 2 problems. The first is unsolvable. The second is usually handled by hiring outsiders to do the survey. – emory Aug 25 '10 at 02:05
  • Suppose you are that outsider. How would you implement your logic? – James Raitsev Aug 25 '10 at 02:08
  • @Hamish. I complemented your name pal. It means something in Russian. I literally said "cool name" – James Raitsev Aug 25 '10 at 02:09
  • 1
    @Hamish Grubijan The drop papers in a box solution does not enforce the guarantee of one person one survey. Some employees will blow off the survey and others will print multiple copies of their responses. – emory Aug 25 '10 at 02:12
  • 1
    @mac for the outsider, implementation is by reputation. If employees believe that the outsider will not share sensitive data with the boss, they will be more open. The technical requirements are minimal but you can not do this in-house. – emory Aug 25 '10 at 02:25
  • 1
    "There is no good technical solution to a behavioral problem." :) – Hamish Grubijan Aug 25 '10 at 02:26

2 Answers2

5

Create a one-way hash of the IP address (and any other unique identifying attributes), and store the hash with the response. That way no one can lookup the IP address for a response, but you can compare the hash to previously submitted responses to keep ensure people only submit the form once.

There's not much you can do to convince someone your respecting their privacy. Just don't abuse the trust, and people will work it out.

(For an idea on how to create a hash in java see How can I generate an MD5 hash?)

Community
  • 1
  • 1
jimmycavnz
  • 101
  • 3
  • Cool! This is similar to how Gmail does not actually store our passwords, but hashes instead. This solves half of the problem, so that already proves me wrong. – Hamish Grubijan Aug 25 '10 at 02:22
  • You'd have to **heavily** iterate the hash (tiny keyspace...). But even then, if you want to be able to verify the hash in a feasible amount of time, the adversary would be able to pick an IP address and check if it matches the hash or not. He can narrow it down to a smaller range than 2^32. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Aug 25 '10 at 02:23
  • This idea has two problems: 1. Multiple employees sharing the same computer. It will block the second employee from completing the survey. 2. One employee using multiple computers to bypass the one employee, one survey guarantee. But I agree heavily with "There's not much you can do to convince someone your respecting their privacy. Just don't abuse the trust, and people will work it out." – emory Aug 25 '10 at 02:30
  • @Longpoke and @emory: which is why you would add extra identifying attributes. More attributes (such as a browser session id/ or even their surname) would make it harder for attackers, and will also make the identifier *more* unique. – jimmycavnz Aug 25 '10 at 03:24
  • I'm not sure I understand what you're getting at. By "session ID" do you mean a randomly generated token? Then the user can just clear his cookie and get a new one and thus vote again? Same for surname. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Aug 25 '10 at 04:44
3

You can't guarantee either of these. All you can do is raise the bar so it's harder to get around it. If someone really wants to get around your tracking they can if they know enough about your system. Good thing is most people either don't want to bother or don't know how.

You can generate a cryptographic hash and store that in a cookie on the persons browser if you want to prevent proxy problem. Lots of websites do this to keep session creation to track authentication. This is something like using an HMAC to generate something that identifies the browser with a unique key that can't be faked. If they clear their browser though you won't be able to track them.

One way hash of IP address is a way to keep your IP from being tracked, but the same IP always hashes to the same value so you can tell if someone is doing that. However if they go to an internet cafe viola they can resubmit. You'd use SHA1, MD5, etc for that.

You can do the same thing with email address and hash it. To get people to want to participate send the results to their email address instead of displaying in the browser. People just have to trust you won't do nasty things with their email.

Other ideas might be if you know who you want to send the survey too. Generate a random number that identifies the individual response. Then email those links to people. They will then submit under that number, and you don't track email -> random number then you can't correlate the answers with the email address. Once a random number is used once you don't let them submit it again. Track Responses once. Display results many times.

You can combine some of these together to try and work around the deficiencies of the other.

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138