I have a data-frame called predictors
with columns as session_id
and item_id
.
I want to calculate the counts (in the whole data frame) for all items that belong to one particular session.
I have used aggregate method like this :
popularity <- aggregate(predictors$item_id,
FUN = function(items) {(table(predictors$item_id[predictors$item_id %in% items]))},
by = list(predictors$session_id))
Which basically calculates the list of counts (through out predictors
) of all items that belong to one particular session.
e.g. If there are two records as session1 - item1
and session1 - item2
, I would like to get the list of counts (in the whole predictors
dataframe) of item1
and item2
against session1
. (something like session1 - (10, 20)
, when item1
appears 10 times in the dataset, and so on).
I am getting this using above aggregate
method. But I want to make it work faster using data.table
.
Till now I have tried with data.table as follows :
predictors_data.table <- data.table(predictors)
popularity <- predictors_data.table[ , list(p = table(predictors_data.table$item_id[items_list %in% item_id])),
by = c('session_id')]
but I am only getting count for first item and not all items for one particular session.