I'm trying to apply a beta transformation over time in calculating entities' relative ability to achieve a certain target over time. The beta transformation is applied to ensure all disparate datasets are transformed consistently so that when a series of weightings are applied, the weightings meaningfully represent a dataset's "contribution" to an entities' aggregated score.
The challenge with applying the beta transformation over time is that it may produce a lower transformed score even if the raw data are unchanged (see Figure below):
Beta-transformed scores over time
I believe this happens due to relativity: even if an entity's raw value doesn't change, other entities' scores could be improving and so a beta transformation depresses that entity's score even though its raw value is unchanged.
I've tried to implement some simple math to fix this issue:
years=sort(unique(x$year))
codes=unique(x$code)
values=ps=rescored=matrix(0,length(codes),length(years))
for(i in 1:length(x$codes))
{
j=which(years==x$year[i])
k=which(codes==x$iso[i])
values[k,j]=x$value[i]
}
T1values=values[,1]
for(k in 1:length(codes))
{
for(j in 1:length(years))
{
ps[k,j]=(1+sum(T1values<=values[k,j]))/(2+length(isos))
rescored[k,j]=round(qbeta(ps[k,j],5,2)*100,2)
}
}
However, this method oversmooths an entity's scores over time (so small decimal distinctions are lost), and for some reason I get a gap between those entities at target (17 on the raw scale, 100 on the transformed scale, range on transformed scale is 0 to 100).
I need the values to match as closely as possible to the "mrya score" for the latest year, 2015 (see sample_data.csv). sample_data.csv
Any ideas much appreciated!
** EDIT ** Sample input/output data
code raw value year transformed beta transformed
(new method) (old method)
840 9.534868563 2006 69.62 74.1163453
840 9.566570011 2007 69.62 73.07062613
840 9.568561874 2008 69.62 72.87069707
840 9.575458466 2009 69.62 72.66601396
840 9.618749298 2010 69.62 72.22070745
840 9.618945848 2011 69.62 72.22070745
840 9.622768849 2012 69.62 71.55126994
840 9.623843583 2013 69.62 72.26018915
840 9.623843583 2014 69.62 71.56642954
840 9.623843583 2015 69.62 71.56642954
As you can see above, even though the raw values are incrementally increasing, applying a qbeta(5,2) transform results in some transformed values decreasing through the time series (as shown in Figure 1, also in Column 5 "beta transformed (old method). But my attempted fix (from the code snippet above) generates a straight line and oversmooths differences, transforming the different raw values into the same score 69.62. I hope this clarifies my question!