Suppose I have key, value pairs that comprise a userId and a list of boolean integers indicating that the user has an attribute:
userId hasAttrA hasAttrB hasAttrC
joe 1 0 1
jack 1 1 0
jane 0 0 1
jeri 1 0 0
In Scala code, the data structure looks like:
var data = Array(("joe", List(1, 0, 1)),
("jack", List(1, 1, 0)),
("jane", List(0, 0, 1)),
("jeri", List(1, 0, 0)))
I would like to compute the fraction of all users that has the attributes. However, this computation requires that I can sum over all the keys, which I don't know how to do. So I would like to calculate:
- How many users there are?
data.size // 4
- What fraction of users has attribute A?
Should be: sum(hasAttrA) / data.size = 3/4 = 0.75
- What fraction of users has attribute B?
Should be: sum(hasAttrB) / data.size = 1/4 = 0.25
etc.
How can I compute the sums across all the keys, and how can I compute the final percentages?
EDIT 2/24/2016:
I can manually find the sums of individual columns like so:
var sumAttributeA = data.map{ case(id, attributeList) => attributeList(0)}.sum
var sumAttributeB = data.map{ case(id, attributeList) => attributeList(1)}.sum
var sumAttributeC = data.map{ case(id, attributeList) => attributeList(2)}.sum
var fractionAttributeA = sumAttributeA.toDouble/data.size
//fractionAttributeA: Double = 0.75
var fractionAttributeB = sumAttributeB.toDouble/data.size
//fractionAttributeB: Double = 0.25