0

I have a two dataframes, one is:

       movieId  rating
9414    27914   5.0
9640    31945   5.0
15755   83161   5.0
16444   86975   5.0
17745   92783   5.0
17972   93991   5.0
18206   95494   5.0
18472   96799   5.0
18999   99243   5.0
19994   103875  5.0

the other one looks like

movieId tagId   relevance   
1       1   0.02875
1       2   0.02375
1       3   0.06250
1       4   0.07575
1       5   0.14075
... ... ...
206499  1124    0.11000
206499  1125    0.04850
206499  1126    0.01325
206499  1127    0.14025
206499  1128    0.03350

I am trying to filter the second dataframe down so that it only includes values with a corresponding movieId in the first dataframe. I've tried using the code:

keys = list(df1.movieId)
mask = df2.index.isin(keys)
df2[mask]

I've read up on multiindexing and I think thats what my 2nd df could be considered but I'm still having a tough time trying to filter it out this new information. Any help or direction is appreciated.

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • Kindly add you desired output. See [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). – MrNobody33 Jun 22 '20 at 00:01
  • Hi, my desired output would be df2 with all rows with movieIds not in df1 dropped. So it would have: df2 = { 'movieId' : [27914, 27914, 27914... 31945, 31945, 31945, ..... 103875, 103875,...], 'tagId' : [1, 2, 3, ... 1, 2, 3, ... 1, 2...], 'relevance' : [float64, float64, float 64...] } Hope that helps! – Qgerdes Jun 23 '20 at 00:14

1 Answers1

0

You can check with

df2=df2.loc[df2.movieId.isin(df1.movieId)]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I tried doing this once already, for some reason this returns a dataframe of all null values. Do you know why that'd be? – Qgerdes Jun 22 '20 at 13:17