Let's assume that I have a data table with People who watched Movies, like
library(data.table)
DT = fread("
User, Movie
Alice , Fight Club
Alice, The Godfather
Bob, Titanic
Charlotte, The Godfather")
I want to compute, for each pair of movies, the number of people who watched both and the number of people who watched at least one, i.e.
Movie1 Movie2 WatchedOne WatchedBoth
Fight Club The Godfather 2 1
The Godfather Titanic 3 0
Fight Club Titanic 2 0
I have millions of rows and I would need a blazingly fast data.table function :-)
Thanks for help!