1

I am trying to calculate precision and recall at n of a Data set with Boolean Preferences using item item Recommender given in mahout.

I am using GenericBooleanPrefItemBasedRecommender and

evaluate(RecommenderBuilder recommenderBuilder,DataModelBuilder dataModelBuilder, DataModel dataModel,IDRescorer rescorer,int at,double relevanceThreshold,double evaluationPercentage) throws TasteException; `

Since there are Boolean preferences, the set of "relevant" or "good" movies for a user are all the ones rated 1.

If I run the same code many times it always gives the same value of precision and recall and they are always equal to each other. Why? I am NOT using RandomUtils.useTestSeed() How does it split the data into training and test set?

Possibilities:
a)Randomly divides the total data set into test and training at the beginning OR for each user it randomly puts a fixed percentage of relevant movies into test set: :How does it decide this percentage since there is no place for user to input this as a parameter.Why do I get the same value of P and R each time I run the code and why is the value of P at n and R at n the same?
b)For each user, it puts all relevant movies in the training set: Then there is no information left on user to make any recommendations and thus its not possible.

Since I am getting that value of P and R at n are equal, does that mean that for each user, the number of relevant movies are moved to the test set each time = number of recommendations i.e. n. If the n relevant movies put in the test set are random then why do I get same value of P and R each time I run the code.

The only explanation that I can think of that explains the results is that the recommender calculates P and R at n as follows: One by one, for each user it randomly puts 'n' relevant movies in test set. The process has to be random since it can't distinguish between all relevant movies but the process is fixed and each time the code is run it picks the same n relevant movies for each user. It then makes n recommendations and calculates P and R at n.

While this explains the results I don't think it is a good process because:
1)The concept of training and test set is not defined as a percentage and thus not consistent with the usual definition.
2) P and R will always be equal to each other so we only get one metric as opposed to two.
3) The process of picking 'n' movies randomly is the same each time.

EDIT: I AM ADDING MY FULL CODE IN CASE IT HELPS ANSWER MY QUESTION:

public static void main (String[] args) throws Exception {

FileDataModel model = new FileDataModel(new File("data/test.csv"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) {
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
return new GenericBooleanPrefItemBasedRecommender(model, similarity);
}
};

IRStatistics stats = evaluator.evaluate(
recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,1.0);

System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
}
N Shnat
  • 11
  • 4
  • Welcome to Stackoverflow. Please have a look at the [formatting help](http://stackoverflow.com/help/formatting) and try out for example _code formatting_ and _simple lists_. Another problem with your post is, that it appears to be too broad and includes more than one question and/or the main question is not clear. Please be more specific. – user1251007 May 21 '14 at 11:30

2 Answers2

0

Don't know for sure but if you seed a random number generator with the same value each time you use it, the sequence of numbers it returns will be identical. Check to see if there is a way to seed the rng with something like the system time. Just a guess.

pferrel
  • 5,673
  • 5
  • 30
  • 41
0

Check out my answer on related question: How mahout's recommendation evaluator works

I think this will help you understand how the evaluation works, how the relevant items are chosen, and how Precision and Recall are computed.

Community
  • 1
  • 1
Dragan Milcevski
  • 776
  • 7
  • 17
  • I did see your answer before posting my question. It is very helpful in explaining the case with actual user ratings. My question is regarding the case with Boolean preferences where each entry is simple (userId,ItemId) which means that useId liked itemId. Thus for any threshold(<=1),for each user the set of relevant movies is equal to all movies rated by him(pref=1). If all of them are moved to the test set then there is no data from that user left in training set to make a recommendation. Additionally, this doesn't explain why value of Precision is equal to value of Recall. – N Shnat May 22 '14 at 12:44
  • 1
    From the book Mahout in Action on page 23, Chapter 2.4.2 says: `The issue is further complicated when the preferences are Boolean and contain no preference value. There isn’t even a notion of relative preference on which to select a subset of good items. The best the test can do is randomly select some preferred items as the good ones.` – Dragan Milcevski May 22 '14 at 13:17
  • If the process is random then shouldn't the set of relevant movies selected change each time I run the code so I get different values of P each time I run the code. Then I code run it many times and average the values so that I get a reliable number. Also how does it decide how many preferred items to select randomly since the user doesn't enter any percentage? Unless its the same as n(# of recommendations) why is P=R? – N Shnat May 23 '14 at 06:27
  • 1
    On page 38, Chapter 3.3.3 in the book Mahout in Action you have simple code for evaluating precision and recall for boolean data. From your code I can see that you are not telling mahout that your data model contains boolean data. You have to wrap your FileDataModel into GenericBooleanPrefDataModel like this: `DataModel model = GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(fileDataModel))`; The book Mahout in action is your Bible for Mahout. You will find most of the things there: http://manning.com/owen/ – Dragan Milcevski May 23 '14 at 10:56
  • As for your questions, you are using `GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD` which means that the relevance threshold is NaN and then the following function for computing threshold is called `computeThreshold()`. Inside, it computes the average and the std dev. and it returns the sum of the both. Since, for boolean data, there is no pref values, Mahout always returns 1.0f. The avg and stdDev is 1 and 0, and the relevance threshold is always 1. This means that all items are relevant. You can try putting your own number there, e.g. 0.7 to see if there are changes in P & R. – Dragan Milcevski May 23 '14 at 11:12
  • Thanks. Unfortunately even when I make the code exactly as on page 38 there was no change in my results. It always returns the same value of P and R which are equal to each other. Specifying threshold values(0.7 etc) makes no difference either. All user preferences are 1 so changing the threshold cannot accomplish anything. I still can't think of any explanation other than what I wrote in my original post which is not a very good process for the reasons I specified. – N Shnat May 23 '14 at 12:01