I'm looking to optimise the following problem [simplified version here]:
I have two data frames, the first contains the information.
user_id | game_id | score | ON |
---|---|---|---|
1 | 1 | 450 | 1 |
1 | 2 | 200 | 1 |
1 | 3 | 400 | 1 |
2 | 1 | 225 | 1 |
2 | 2 | 150 | 1 |
2 | 3 | 200 | 1 |
The second contains the conditions.
game_id | game_id_ref | req_score | type |
---|---|---|---|
2 | 1 | 150 | 1 |
3 | 1 | 200 | 1 |
1 | 1 | 400 | 2 |
3 | 2 | 175 | 1 |
The conditions should be evaluated on the information data frame in the following way.
The conditions with type == 1 describe TURN ON conditions, and enforce that a game can only TURN ON if the score on the game from the game_id_ref >= req_score, so the first row from the conditions should be read as; the game with game_id == 2 can only TURN ON for user X when they have a score of 150 or higher on the game with game_id == 1.
The conditions with type == 2 describe TURN OFF conditions, and enforce that a game must be TURNED OFF if the score on the game from the game_id_ref >= req_score, so the third row from the conditions should be read as; for user X the game with game_id == 1 must be TURNED OFF when they have a score of 400 or higher on the game with game_id == 1.
In the information data frame I have a column ON which indicates if a game is ON for a particular user. The default is 1 [the game is ON] but this is before evaluating all the conditions. I am looking for the fastest way to evaluate the conditions for each user separately, and return the same information data frame, however now with ON = 0 if for a user the game fails to meet criteria type 1 or met criteria type 2.
So for this mock example, the required output would be:
user_id | game_id | score | ON |
---|---|---|---|
1 | 1 | 450 | 0 |
1 | 2 | 200 | 1 |
1 | 3 | 400 | 1 |
2 | 1 | 225 | 1 |
2 | 2 | 150 | 1 |
2 | 3 | 200 | 0 |
My current solution has been to create a separate function in which I check this by applying a for_loop over all the rows of the conditions table [approx 100 conditions], and using this function in a group_map function, on the information data frame grouped by the user_ids [approx 350000 unique users]. While this works relatively ok [approx 10 min], I would like to know if someone has a much faster solution for this.
Thanks!