I have a tidy
data set which describes attributes of products. Each product have many attributes, and each attribute is described in each row. My goal is to do some calculations on each product, without using loops. The reason for not wanting to use loops is that there are several hundreds of thousands of products, and thus many million attributes.
Toy dataset with only one product:
df <- data.frame(productID = 1, attributeID = seq(1,15,1), dataType = c('range', 'range', 'predefined', 'predefined', 'bool', 'bool', 'bool', 'bool', 'double', 'double', 'double', 'double', 'double', 'double', 'double'), double = c(NA,NA,NA,NA,NA,NA,NA,NA,0,0,15,11.4,6,0,0), logical = c(NA,NA,NA,NA,TRUE,FALSE,FALSE,FALSE,NA,NA,NA,NA,NA,NA,NA), predefined = c(NA,NA,'Black','Round',NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA), from.value = c(0,0,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA), to.value = c(249,368,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
# productID attributeID dataType double logical predefined from.value to.value
# 1 1 1 range NA NA <NA> 0 249
# 2 1 2 range NA NA <NA> 0 368
# 3 1 3 predefined NA NA Black NA NA
# 4 1 4 predefined NA NA Round NA NA
# 5 1 5 bool NA TRUE <NA> NA NA
# 6 1 6 bool NA FALSE <NA> NA NA
# 7 1 7 bool NA FALSE <NA> NA NA
# 8 1 8 bool NA FALSE <NA> NA NA
# 9 1 9 double 0.0 NA <NA> NA NA
# 10 1 10 double 0.0 NA <NA> NA NA
# 11 1 11 double 15.0 NA <NA> NA NA
# 12 1 12 double 11.4 NA <NA> NA NA
# 13 1 13 double 6.0 NA <NA> NA NA
# 14 1 14 double 0.0 NA <NA> NA NA
# 15 1 15 double 0.0 NA <NA> NA NA
For example, how would one go about counting the zeros for each product in the double
column?