-1

I'm trying to understand the cross domain behavior on multiple websites. I have this information

Website       ClientID     SessionId
-------------------------------------
domain1          xxx         d.0686
domain2          xxx         d.0686
domain3          yyy         f.1871
domain2          yyy         f.1871
domain4          yyy         f.1871
domain1          zzz         n.9210
domain2          zzz         n.9210

People can move across multiple website but they keep the ClientID (stored as a cookie) and the SessionID (shared between different website when a person moves from a domain to another).

I need to see how many SessionID two different websites share. I guess the easiest way is to create a Matrix counting the shared unique SessionIds. This would be the result based on the above table

          Domain1     Domain2     Domain3     Domain4 
    --------------------------------------------------------
domain1     0            2          0           0

domain2     2            0          1           1

domain3     0            1          0           0

domain4     0            1          0           0

This way I can count how many times two different website are used in the same sessionID and create a Chord Diagram with circlize() package to visualise the relation.

Is it possible to do it on R?

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 4
    Yes, it is possible.. But SO is not a code producing factory. Show your own effort/code first, and the users will gladly help you improve it. – Wimpel Oct 17 '18 at 17:22
  • 2
    The following should be a compulsory read for every new contributor. Give it a read John. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Hector Haffenden Oct 17 '18 at 17:40

1 Answers1

0
domains <- unique(information$Website)
output <- matrix(0, length(domains), length(domains))
colnames(output) <- rownames(output) <- domains

for (x in domains) {
  X <- unique(information[information$Website == x, 'SessionId'])
  for (y in domains) {
    Y <- unique(information[information$Website == y, 'SessionId'])
    output[rownames(output) == x, y] <- length(intersect(X, Y))
  }
}

print(output)

#domain1 domain2 domain3 domain4
#domain1       2       2       0       0
#domain2       2       3       1       1
#domain3       0       1       1       1
#domain4       0       1       1       1

data

information <- structure(list(Website = c("domain1", "domain2", "domain3", "domain2", "domain4", 
                                          "domain1", "domain2"), 
                              ClientID = c("xxx", "xxx", "yyy", "yyy", "yyy", "zzz", "zzz"), 
                              SessionId = c("d.0686", "d.0686", "f.1871", "f.1871", "f.1871", 
                                            "n.9210", "n.9210")), 
                         .Names = c("Website", "ClientID", "SessionId"), 
                         row.names = c(NA, -7L), class = "data.frame")
12b345b6b78
  • 995
  • 5
  • 16