0

Alright guys. My professor says that there is a way to do this function without the help of any loops in Python3. I'm not seeing it atm. She recommends using zip, enumerate, readlines, and split(";") (Every review is followed by a ';', if there are two in a row it means that that reviewer did not review the movie). What I'm doing is reading in a movie, looking for a comparison movie in the movMat list of lists. I then compare them for common reviewers. After that I have to get the Pearson calculation, which involves getting the common reviewers of the current movie, the values of the target movie (the compare movie), getting the mean of said common reviewer values, the standard deviation and finally the Pearson R correlation.

def pCalc (movMat, movNumber, n) 
    indexes1 = [i for i,x in enumerate(movMat[movNumber][1].split(';')) if x == '1' or x == '2' or x == '3' or x == '4' or x == '5' ]
    indexes2 = [i for i,x in enumerate(movMat[n][1].split(';')) if x == '1' or x == '2' or x == '3' or x == '4' or x == '5' ]

    compare = list(set(indexes1).intersection(indexes2))

    xi = []
    for index, val in enumerate(movMat[movNumber][1].split(';')):
        if index in compare:
             xi.append(int(val))

    average1 = sum(xi)/len(compare)
    stdDev1 = statistics.stdev(xi)

    yi = []
    for index, val in enumerate(movMat[n][1].split(';')):
        if index in compare:
             yi.append(int(val))

    average2 = sum(yi)/len(compare)
    stdDev2 = statistics.stdev(yi)

    r = 0
    newSum = 0

    for i in range(0, len(compare)):
        newSum += ((xi[i]-average1)/stdDev1) * ((yi[i]-average2)/stdDev2)

    r = (1/(len(compare)-1)) * newSum

An example input would be:

The main part of this program handles argument calls, lines in the file and whatnot but a sample output for an input of command line argument '1' would call up toy story and compare it to other movies within the database like this:

Movie number: Movie  1|Toy Story (1995)

*** No. of rows (movies) in matrix =  1682
*** No. of columns (reviewers) = 943
Output shows r-value, movie no.|name, no. of ratings

compare movie is  1|Toy Story (1995)
no. of common reviewers 452
target avg   3.8783185840707963
compare avg  3.8783185840707963
target std   0.9278967014291252
compare std  0.9278967014291252
r            0.999999999999991

compare movie is  2|GoldenEye (1995)
no. of common reviewers 104
target avg   3.8653846153846154
compare avg  3.201923076923077
target std   0.9456871165874381
compare std  0.9177833965361495
r            0.22178411018797187

compare movie is  3|Four Rooms (1995)
no. of common reviewers 78
target avg   3.717948717948718
compare avg  2.9358974358974357
target std   0.9520645495064435
compare std  1.2096982943568881
r            0.1757942980351483

compare movie is  4|Get Shorty (1995)
no. of common reviewers 149
target avg   3.87248322147651
compare avg  3.530201342281879
target std   0.9247979370536794
compare std  0.9970025819307402
r            0.10313529410109303
  • 1
    Pythonic would be using a relevant module like `scipy` or `numpy` to do it for you, but I imagine that isn't allowed. This question may be useful to you http://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python – HavelTheGreat Mar 10 '15 at 16:13
  • I could try, but it has to work on the school remote server, and I don't think that we have scipy or numpy downloaded on there. – Richard John Catalano Mar 10 '15 at 16:17
  • It probably wouldn't be what your professor wanted then. [This answer](http://stackoverflow.com/a/5713856/4450134) in particular could prove to be useful for you. – HavelTheGreat Mar 10 '15 at 16:21
  • That's definitely a little helpful, but it only solves one part. How do I cut down on the loops throughout the rest of the equation? – Richard John Catalano Mar 10 '15 at 16:32
  • It's kind of hard to see what the purpose of some of your code is at this point. Could you add some sample input and output? – HavelTheGreat Mar 10 '15 at 16:34
  • How's that? I can also post some of the input file a little later. Its a bit huge though. – Richard John Catalano Mar 10 '15 at 17:51

0 Answers0