I am am trying to use it to model the pricing accuracy of a group of sellers in a network.
My data set (pricing) looks like this:
transactionID sellerID expectedprice actualprice pricediff
1 1001 251 200 210 10
2 1002 101 200 300 100
3 1003 251 400 190 -210
4 1004 251 300 300 0
5 1005 101 250 250 0
6 1006 350 200 210 10
7 1007 401 400 400 0
Note: I am not trying to do a standard deviation calculation since I am not trying to calculate variance from the mean, but rather variance from the expected value column which will differ depending on the transaction.
I am comfortable inserting new columns to get absolute variances from the expected value into the table using:
pricing$diffabs <- abs(pricing$pricediff)
which results in the following:
transactionID sellerID expectedprice actualprice pricediff diffabs
1001 251 200 210 10 10
1002 101 200 300 100 100
1003 251 400 190 -210 210
1004 251 300 300 0 0
1005 101 250 250 0 0
1006 350 200 210 10 10
1007 401 400 400 0 0
How does one then calculate a variance score for each seller which would be:
the sum of abs(pricing$diff)
grouped at the "sellerID" divided by the number of observations (count) of "sellerID" in the data.
The output I would expect to be would be the following:
SellerID Count Sumofdiffabs Variation
251 3 220 73.33333333
101 2 100 50
350 1 10 10
401 1 0 0
The other help topics that deal with variances in R at an aggregated level seem to only deal with standard deviation or variances from mean, such as this:
Calculating grouped variance from a frequency table in R
The aggregate function works well for me when using a simple function like standard deviation, but not where I have to figure out how to insert a count into function. What is throwing me off, is that my variance is a deviation not from the mean, but from a column result in my table.