I have two dataframes: let's call them group_user_log and group_user
group_user_log
user_id server_time session_id
1 2018-01-01 435
1 2018-01-01 435
1 2018-01-04 675
1 2018-01-05 454
1 2018-01-05 454
1 2018-01-06 920
group_train
user_id impression_time totalcount distinct_count
1 2018-01-03 0 0
1 2018-01-05 0 0
Logic is to pull total and distinct count of session_id from group_user_log where server_time is less than impression_time and populate the total and distinct count columns. Expected output for group_train is:
user_id impression_time totalcount distinct_count
1 2018-01-03 2 1
1 2018-01-05 3 2
I tried doing it row-by-row but that is time consuming and very inefficient for larger dataframes because above data is a subset for a particular user_id from two large dataframes and such calculation needs to be done for a large number of user_id so I am looking to make it efficient.
Thanks for your help!!