Calculate aggregated variations from expected values (not std deviations)

Question

I am am trying to use it to model the pricing accuracy of a group of sellers in a network.

My data set (pricing) looks like this:

  transactionID sellerID expectedprice actualprice pricediff
1          1001      251           200         210        10
2          1002      101           200         300       100
3          1003      251           400         190      -210
4          1004      251           300         300         0
5          1005      101           250         250         0
6          1006      350           200         210        10
7          1007      401           400         400         0

Note: I am not trying to do a standard deviation calculation since I am not trying to calculate variance from the mean, but rather variance from the expected value column which will differ depending on the transaction.

I am comfortable inserting new columns to get absolute variances from the expected value into the table using:

pricing$diffabs <- abs(pricing$pricediff)

which results in the following:

transactionID   sellerID    expectedprice   actualprice pricediff   diffabs
1001            251         200             210         10          10
1002            101         200             300         100         100
1003            251         400             190         -210            210
1004            251         300             300         0           0
1005            101         250             250         0           0
1006            350         200             210         10          10
1007            401         400             400         0           0

How does one then calculate a variance score for each seller which would be:

the sum of abs(pricing$diff) grouped at the "sellerID" divided by the number of observations (count) of "sellerID" in the data.

The output I would expect to be would be the following:

SellerID    Count   Sumofdiffabs    Variation
251         3       220             73.33333333
101         2       100             50
350         1       10              10
401         1       0               0

The other help topics that deal with variances in R at an aggregated level seem to only deal with standard deviation or variances from mean, such as this:

Calculating grouped variance from a frequency table in R

The aggregate function works well for me when using a simple function like standard deviation, but not where I have to figure out how to insert a count into function. What is throwing me off, is that my variance is a deviation not from the mean, but from a column result in my table.

You should start by creating a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — nsheff, Jun 03 '15 at 08:26

HOSS_JFL · Accepted Answer · 2015-06-03T15:32:01.943

m =  matrix(c(1001,251,200,210,10,1002,101,200,300,100,1003,251,400,190,-210,1004,251,300,300,0,1005,101,250,250,0,1006,350,200,210,10,1007,401,400,400,0),ncol = 5,nrow=7,byrow=TRUE)
colnames(m) = c("transactionID","sellerID","expectedprice","actualprice","pricediff")
pricing = as.data.frame(m)
pricing$diffabs <- abs(pricing$pricediff)
pricing

  transactionID sellerID expectedprice actualprice pricediff diffabs
           1001      251           200         210        10      10
           1002      101           200         300       100     100
           1003      251           400         190      -210     210
           1004      251           300         300         0       0
           1005      101           250         250         0       0
           1006      350           200         210        10      10
           1007      401           400         400         0       0

So here is the result:

library(data.table)
pricing = as.data.table(pricing)
f <- function(x) {list( Count=length(x))}
result <- pricing[ , c(f(diffabs),     Sumofdiffabs=sum(diffabs),Variation=mean(diffabs)),by=sellerID]
result
   sellerID Count Sumofdiffabs Variation
1:      251     3          220  73.33333
2:      101     2          100  50.00000
3:      350     1           10  10.00000
4:      401     1            0   0.00000

where Variation = Sumofdiffabs / Count (grouped by SellerID) — guymeetsdata, Jun 03 '15 at 11:47
You can easily generate the expected output with the code that I posted! — HOSS_JFL, Jun 03 '15 at 13:05
The expected output using the code you posted shows me diffabs, but not variation as defined. Sorry if I am a little naive, but this is literally my first question posted here. — guymeetsdata, Jun 03 '15 at 14:25
Yes it does - thanks it worked perfectly once I installed the correct packages (data.table and its dependencies). Have ticked it as resolved. — guymeetsdata, Jun 06 '15 at 18:22

Calculate aggregated variations from expected values (not std deviations)

1 Answers1