I am analyzing the flow of customers between different shopping venues. I have data like this:
df <- data.frame(customer.id=letters[seq(1,7)],
shop.1=c(1,1,1,1,1,0,0),
shop.2=c(0,0,1,1,1,1,0),
shop.3=c(1,0,0,0,0,0,1))
df
#> customer.id shop.1 shop.2 shop.3
#> 1 a 1 0 1
#> 2 b 1 0 0
#> 3 c 1 1 0
#> 4 d 1 1 0
#> 5 e 1 1 0
#> 6 f 0 1 0
#> 7 g 0 0 1
So, for example:
customer "a" shopped at shops 1 & 3 only,
customer "b" shopped at shop 1 only,
customer "c" shopped at shops 1 & 2 only,
- etc.
I want to summarize the data like so:
#> shop.1 shop.2 shop.3
#> shop.1 5 3 1
#> shop.2 3 4 0
#> shop.3 1 0 2
So, for example, row 1 reads:
- 5 people shopped at both shop 1 and shop 1 (this is obviously a redundant observation)
- 3 people shopped at both shop 1 and shop 2
- 1 person shopped at both shop 1 and shop 3
How can I accomplish this (please note: I have many shops in my data set, so a scalable approach is preferred)?