My question is similar to Pandas: Selecting rows based on value counts of a particular column but with TWO columns:
This is a very small snippet from the dataframe (The main df contains millions of entries):
overall vote verified reviewTime reviewerID productID
4677505 5.0 NaN True 11 28, 2017 A2O8EJJBFJ9F1 B00NR2VMNC
1302483 5.0 NaN True 04 1, 2017 A1YMYW7EWN4RL3 B001S2PPT0
5073908 3.0 83 True 02 12, 2016 A3H796UY7GIX0K B00ULRFQ1A
200512 5.0 NaN True 07 14, 2016 A150W68P8PYXZE B0000DC0T3
1529831 5.0 NaN True 12 19, 2013 A28GVVNJUZ3VFA B002WE3BZ8
1141922 5.0 NaN False 12 20, 2008 A2UOHALGF2X77Q B001CCLBSA
5930187 3.0 2 True 05 21, 2018 A2CUSR21CZQ6J7 B01DCDG9JC
1863730 5.0 NaN True 05 6, 2017 A38A3VQL8RLS8D B004HKIB6E
1835030 5.0 NaN True 06 20, 2016 A30QT3MWWEPNIE B004D09HRK
4226935 5.0 NaN True 12 27, 2015 A3UORFPF49N96B B00JP12170
Now I want to filter the dataframe so that each reviewerID and productID appears at least k times (lets say k=2) in the final filtered dataframe. In other words: That each user and product has at least k distinct entries/rows.
I would greatly appreciate any help.