1

I'm trying to count the occurrences of false and true in a dataset to determine if an event is a comment to a post or a comment to a comment in a Facebook dataset.

I would like to do this in data.table as I've noticed that this is often the fastest and most readable way to do this. The code below is code that I've tried, and it works. I would however like to do this operation in one line.

CEM_CtC <- aggregate_comments_data[event.is_comment_to_post =="false", .N, by = event.post.id]
CEM_CtP <- aggregate_comments_data[event.is_comment_to_post =="true", .N, by = event.post.id]
CEM_post_data <- merge(CEM_CtC, CEM_CtP, by = "event.post.id", all=T)

It is essential for the process that the outcome table is formatted like this

event.post.id CEM_CtC CEM_CtP
    382719578      50     100
    238947597      50     100
    934829234      50     100
Cœur
  • 37,241
  • 25
  • 195
  • 267
Bram
  • 13
  • 7

1 Answers1

1

Untested, since you don't have a reproducible example, but something like this will work:

dcast(aggregate_comments_data, event.post.id ~ event.is_comment_to_post, fun = length)
eddi
  • 49,088
  • 6
  • 104
  • 155
  • 1
    I think `DT[, .N, by=.(x,y)][, dcast(.SD, x ~ y)]` is equivalent to `dcast(DT, x ~ y, fun.aggregate = length)`..? – Frank Oct 05 '18 at 14:22
  • 1
    @Frank you're right. Can even omit `fun.aggregate` if data is non-trivial. – eddi Oct 05 '18 at 14:31