7

I'd like to play around with building a recommendations system, and by that I mean an algorithm that looks at preferences and/or reviews posted by a user and then makes recommendations for them, similar to what netflix or amazon use.

What are some good resources for learning how to write something like this? Where should I start?

TM.
  • 108,298
  • 33
  • 122
  • 127

2 Answers2

5

Check out the Wikipedia page on the Netflix Prize and its discussion forum. Also, the somewhat related 2009 GitHub Contest is a good source for full source code on a number of different recommendation engines. And obviously there's also the Wikipedia page on the topic itself, which has some decent links.

If you start writing your own, you'll want to use a corpus. I'd actually recommend using the Netflix Prize's data set. Just carve the data set into two pieces. Train on the first piece and score your algorithm on the second piece.

Addenda: A somewhat related and scary application of this sort of thing is predicting demographic information: a user's gender, age, household income, IQ, sexual orientation, etc. You could probably do most of these attributes with the Netflix Prize dataset with a fairly high degree of accuracy. Fortunately everyone in that dataset is just a number.

Bob Aman
  • 32,839
  • 9
  • 71
  • 95
  • What's scary about that? Marketers try to predict you all the time based upon your browser, IP, and other info from the HTTP header. It's not actual info, just "informed stereotypes" (conditional Bayes). – isomorphismes Mar 14 '11 at 03:39
  • 1
    Because the data was represented as 'anonymous' but actually wasn't? This is particularly bad if the user never opted-in to their data being shared. – Bob Aman Mar 14 '11 at 20:07
  • I might not understand exactly what you mean. Are you saying anonymity was violated because I can accurately guess User 2871875's demographic characteristics? – isomorphismes Apr 15 '11 at 09:13
  • 1
    No... read the abstract of that last link. They're getting a lookup of User 2871875's record. Not just demographic characteristics. – Bob Aman Apr 15 '11 at 18:51
3

Take a look at pysuggest a Python library that implements a variety of recommendation algorithms for collaborative filtering (which is used by Amazon.com).

Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187