I have a dataset with success or not of some events for some users. An example with 2 users and 3 distinct events:
data.frame(
id = c('A', 'A', 'A', 'B', 'B', 'B'),
event = c('score', 'pass', 'dribble', 'score', 'pass', 'dribble'),
success = c(1, 1, 1, 0, 1, 1)
)
# id event success
# A score 1
# A pass 1
# A dribble 1
# B score 0
# B pass 1
# B dribble 1
I would like to mesure relation between events, how many times 2 successful events are present for a user. When event1 is achieved, event2 is it often achieved too? Event 1 and event 2 are they correlated?
In this example with 2 users, both achieved events 2 and 3, but only one achieved event 1. The expected output is:
data.frame(
event1 = c('score', 'score', 'pass'),
event2 = c('pass', 'dribble', 'dribble'),
corr = c(0.5, 0.5, 1)
)
# event1 event2 corr
# score pass .5
# score dribble .5
# pass dribble 1
Such a table will help me to build a network, to weight and highlight links between the distinct events. Thank you in advance.
I can imagine a solution with a for loop, but I guess there is something more elegant. :)