How can you get a value from a third column knowing the values of the first two

Question

I want to get the value from a third column when the values of the other two columns are given. I want the value of the rating for each of the movies and users to form a user-movie matrix.

I've gotten both the unique movie ids and the user ids in two lists and tried locating the instance where the frequency matches the values I want

import pandas as pd
import numpy as np
import matplotlib as plot

def main():
    df = pd.read_csv(r'/Users/ttbarack/Desktop/ratings.csv')
    #print(df)
    userIds = []
    for id in df['userId']:
        if id not in userIds:
            userIds.append(id)
    #print(userIds)
    movieIds = []
    for movie in df['movieId']:
        if movie not in movieIds:
            movieIds.append(movie)
    #print(movieIds)


    """PART 1"""


    finalList = []
    for id in userIds:
        newlist = []
        for mov in movieIds:
            newlist.append(df['rating'].where(df['userId'].values() == id and df['movieId'].values() == mov))
        finalList.append(newlist)
    print(finalList)

This is the error I'm getting:

Traceback (most recent call last):
  File "/Users/ttbarack/PycharmProjects/Proj1/Project2.py", line 29, in <module>
    main()
  File "/Users/ttbarack/PycharmProjects/Proj1/Project2.py", line 22, in main
    newlist.append(df['rating'].where(df['userId'].values() == id and df['movieId'].values() == mov))
TypeError: 'numpy.ndarray' object is not callable

This looks like it could more easily be accomplished with [boolean masking or df.where](https://pandas.pydata.org/pandas-docs/version/0.23.4/indexing.html#the-where-method-and-masking). For better help, have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide sample input and output — G. Anderson, Nov 04 '19 at 18:27

score 0 · Answer 1 · answered Nov 04 '19 at 18:28

0

The error is because you are calling the numpy array as a function

use :

newlist.append(df['rating'].where(df['userId'] == id and df['movieId'] == mov))

instead of

newlist.append(df['rating'].where(df['userId'].values() == id and df['movieId'].values() == mov))

answered Nov 04 '19 at 18:28

roshan ok

383
1
6

I tried that and now I'm getting the error: File "/Users/shrutisrinivasan/PycharmProjects/Proj1/Project2.py", line 27, in main newlist.append(df['rating'].where(df['userId'] == id and df['movieId'] == mov)) File "/Users/shrutisrinivasan/PycharmProjects/Proj1/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 1555, in __nonzero__ self.__class__.__name__ ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – Trevor Barack Nov 06 '19 at 17:49
The goal is to form a matrix with the rows as the user ids and the columns as the movie ids, with the values as the ratings. Can you please help me? – Trevor Barack Nov 06 '19 at 17:51
The or and and python statements require truth-values. For pandas these are considered ambiguous. so use "bitwise" | (or) or & (and) operations: like df['userId'].values() == id & df['movieId'].values() – roshan ok Nov 06 '19 at 18:37

How can you get a value from a third column knowing the values of the first two

1 Answers1