I have the following challenge: I have a dataframe called hashtags_users_grouped which has the following structure:
hashtag_id | user_id | count
123 1 1
245 1 3
123 2 5
In each row, we find values that tell me when a certain user mentioned a certain hashtag and how many times he did it. In this example, user 1 mentioned hashtag 123 one time and 245 three times, while user 2 only mentioned hashtag 123 five times.
I want to have a dataframe with the following output:
user | 123 | 245
1 1 3
2 5 0
In other words, the same information as the first table, but with a column per hashtag, to know the amount of times a user mentioned each hashtag. I read the documentation and tried to run the following, without success:
a = hashtags_users_joined_grouped_df.groupBy("user_id").pivot("hashtag_id")
a.show(5)
I got the following error message:
AttributeError: 'GroupedData' object has no attribute 'show'
Do you know any way to do this?