I want to set up my dataframe so that it groups by my ID column, but have many columns for my categorical variables and interaction effects.
So this is how the original table looks like.
+----+----------------+---------+
| ID | Page | Click |
+----+----------------+---------+
| 1 | homepage | logo |
| 1 | homepage | search |
| 1 | category page | logo |
| 1 | category page | search |
| 2 | homepage | logo |
| 2 | homepage | search |
| .. | | |
+----+----------------+---------+
I would like to make it into a table like this.
+----+----------------+--------------------+------------+---------------+-----------------+----------------------+---------------+-------------------+
| ID | Page_homepage | Page_categorypage | Click_logo | Click_search | homepage:search | categorypage:search | homepage:logo | categorypage:logo |
+----+----------------+--------------------+------------+---------------+-----------------+----------------------+---------------+-------------------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
+----+----------------+--------------------+------------+---------------+-----------------+----------------------+---------------+-------------------+
My objective is to be able to create features with interaction effects to perform a logistic regression. There are outputs associated with each ID, so it's important for me to group the results by ID.
What is the best and simplest way to do this? I don't want to manually do it for all the possible variations. I'm indifferent between using R/Python/SQL to perform this.