I am working within PySpark, and have a transaction table imported as a Spark DataFrame as follows:
User_ID Date Product_Name
-------- ------ -------------
A 2019-11-30. Product 1
B 2019-10-20 Product 2
C 2019-10-01 Product 1
A 2019-12-01 Product 1
What I am trying to do is create a resulting table that for each unique User_ID, counts whether or not that user has bought more of product 1 than product 2, and then will return the string, "Product 1", or "Product 2" in the other case in the second column of this new table.
I am finding it difficult to in PySpark.