I have a data.table on which I want to find the weighted mean of column performance
by month.
dat <- structure(list(year = c(2014, 2015, 2016, 2017, 2018, 2019, 2020,
2021, 2014, 2015, 2016, 2017, 2018, 2019, 2020),
month = c(2,
2, 2, 2, 2, 2, 2, 2, 10, 10, 10, 10, 10, 10, 10),
performance = c(0.826973794097158,
0.61975709469356, 0.924350659523548, -0.183133219063708, -0.529913189565746,
-0.148531188902535, -0.0773058814083695, 1.42862504650241, 0.465498268732376,
0.148719963224136, 0.205614191281359, 0.560651497949418, -0.484408605607923,
0.875353374774486, 0.351469397380814)),
row.names = c(NA, -15L), class = c("data.table", "data.frame"))
This data.table looks like following -
year month performance
1: 2014 2 0.82697379
2: 2015 2 0.61975709
3: 2016 2 0.92435066
4: 2017 2 -0.18313322
5: 2018 2 -0.52991319
6: 2019 2 -0.14853119
7: 2020 2 -0.07730588
8: 2021 2 1.42862505
9: 2014 10 0.46549827
10: 2015 10 0.14871996
11: 2016 10 0.20561419
12: 2017 10 0.56065150
13: 2018 10 -0.48440861
14: 2019 10 0.87535337
15: 2020 10 0.35146940
To find the weighted mean by month, I have used the following code -
setDT(dat)[, lapply(.SD, function(x) weighted.mean(x, na.rm = TRUE)), by = .(month), .SDcols = c("performance")]
and the result I am getting is -
month performance
1: 2 0.3576029
2: 10 0.3032712
However, the weighted mean performance of month 10
should be greater than month 2
as it has more positive values.
It seems that only the month 2
of the year 2021, is weighing heavily on its performance causing it to outperform the performance of month 10
.
Actually, the code above is only finding the mean
and NOT the weighted.mean
. The result is the same if I use mean
instead of weighted.mean
.
setDT(dat)[, lapply(.SD, function(x) mean(x, na.rm = TRUE)), by = .(month), .SDcols = c("performance")]
and the result after using simple mean is following, which is the same as the result of weighted.mean.
month performance
1: 2 0.3576029
2: 10 0.3032712
The desired result should give equal weight to the performance of each year so that exceptional performance in one particular year does not falsely show that the product sells wonderfully during that month each year.
Can someone point out what is wrong with my weighted mean calculation?