Separately counting two exact string matches in data.table (and aggregating at the same time)

Question

I'm trying to count the occurrences of false and true in a dataset to determine if an event is a comment to a post or a comment to a comment in a Facebook dataset.

I would like to do this in data.table as I've noticed that this is often the fastest and most readable way to do this. The code below is code that I've tried, and it works. I would however like to do this operation in one line.

CEM_CtC <- aggregate_comments_data[event.is_comment_to_post =="false", .N, by = event.post.id]
CEM_CtP <- aggregate_comments_data[event.is_comment_to_post =="true", .N, by = event.post.id]
CEM_post_data <- merge(CEM_CtC, CEM_CtP, by = "event.post.id", all=T)

It is essential for the process that the outcome table is formatted like this

event.post.id CEM_CtC CEM_CtP
    382719578      50     100
    238947597      50     100
    934829234      50     100

eddi · Accepted Answer · 2018-10-05T14:30:27.063

1

Untested, since you don't have a reproducible example, but something like this will work:

dcast(aggregate_comments_data, event.post.id ~ event.is_comment_to_post, fun = length)

edited Oct 05 '18 at 14:30

answered Oct 05 '18 at 14:14

eddi

49,088
6
104
155

1

I think `DT[, .N, by=.(x,y)][, dcast(.SD, x ~ y)]` is equivalent to `dcast(DT, x ~ y, fun.aggregate = length)`..? – Frank Oct 05 '18 at 14:22
1

@Frank you're right. Can even omit `fun.aggregate` if data is non-trivial. – eddi Oct 05 '18 at 14:31

Separately counting two exact string matches in data.table (and aggregating at the same time)

1 Answers1