Quickly accessing overlapping information with data.table

Question

My goal is to get the number of common opponents for each car in a data set in a quick manner. For example, if the Datsun raced the Mazda RX4 and the Merc 230 and the Mazda RX4 also raced the Merc 230, we would return '1' for the number of common opponents. Provided is a sample using the mtcars data and I run my function on that. For such a small data set it takes around .31 seconds, but for larger data sets it takes quite a while.

#setting up sample dataset#
set.seed(44)
data(mtcars)
mtcars$car<-row.names(mtcars)
mtcars<-as.data.table(mtcars)

for(i in 1:nrow(mtcars)){

    mtcars[i,count:=sample(1:5,1)]

}

expanded <- data.table(car = rep(mtcars$car, mtcars$count),
                       opponent = sample(mtcars$car, mtcars$count),
                       wins=sample(1:mtcars$count,mtcars$count))

head(expanded)
             car          opponent wins
1:     Mazda RX4 Chrysler Imperial    1
2:     Mazda RX4          Merc 280    2
3:     Mazda RX4    Toyota Corolla    4
4:     Mazda RX4          Merc 230    3
5: Mazda RX4 Wag Chrysler Imperial    1
6: Mazda RX4 Wag          Merc 280    2

#this is the function i use now which takes a while#
commonCars<-function(carA,carB){

    tA<-unique(expanded[car==carA,opponent]) #getting unique opponents of first car
    tB<-unique(expanded[car==carB,opponent]) #getting unique opponents of first car's opponent
    commonTeams<-tB[tB %in% tA] #getting their common opponents

    return(nrow(expanded[car==carA & opponent %in% commonTeams,])) #returning the number of commons

}

ptm<-proc.time()


for(i in unique(expanded[,car])) { #looping through each individual car

    for(j in unique(expanded[car==i,opponent])){ #getting the cars they raced#

        expanded[car==i & opponent==j,common:=commonCars(i,j)] 


    }


}

proc.time()-ptm
user  system elapsed 
0.29    0.00    0.30

head(expanded)
             car          opponent wins common
1:     Mazda RX4     Maserati Bora    4      3
2:     Mazda RX4    Hornet 4 Drive    2      3
3:     Mazda RX4        Datsun 710    3      3
4:     Mazda RX4 Chrysler Imperial    1      1
5: Mazda RX4 Wag     Maserati Bora    4      1
6: Mazda RX4 Wag    Hornet 4 Drive    2      1

Last computation with for loop can be replaced by `expanded[, common:=commonCars(car, opponent), .(car, opponent)]` — Khashaa, Jul 13 '15 at 01:22
Can you give the expected outcome for `df <- data.frame( car=c("Car1", "Car1", "Car1", "Car1", "Car2", "Car2", "Car3", "Car3", "Car3"), opponent= c("Car3","Car3", "Car2","Car2","Car1","Car3","Car1","Car1","Car2" ))` — Khashaa, Jul 13 '15 at 04:38
Your construction of `expanded` doesn't do what you seem to think it's doing. The second argument of sample should be a single number, not a vector. Only its first element is used when you give it a vector. You can see that your "opponent" and "wins" columns recycle/repeat — Frank, Jul 13 '15 at 14:23

Quickly accessing overlapping information with data.table

0 Answers0