1

I'm trying to apply a beta transformation over time in calculating entities' relative ability to achieve a certain target over time. The beta transformation is applied to ensure all disparate datasets are transformed consistently so that when a series of weightings are applied, the weightings meaningfully represent a dataset's "contribution" to an entities' aggregated score.

The challenge with applying the beta transformation over time is that it may produce a lower transformed score even if the raw data are unchanged (see Figure below):

Beta-transformed scores over time

I believe this happens due to relativity: even if an entity's raw value doesn't change, other entities' scores could be improving and so a beta transformation depresses that entity's score even though its raw value is unchanged.

I've tried to implement some simple math to fix this issue:

years=sort(unique(x$year))
codes=unique(x$code)
values=ps=rescored=matrix(0,length(codes),length(years))
for(i in 1:length(x$codes))
{
  j=which(years==x$year[i])
  k=which(codes==x$iso[i])
  values[k,j]=x$value[i]
}
T1values=values[,1]
for(k in 1:length(codes))
{
  for(j in 1:length(years))
  {
    ps[k,j]=(1+sum(T1values<=values[k,j]))/(2+length(isos))
    rescored[k,j]=round(qbeta(ps[k,j],5,2)*100,2)
  }
}

However, this method oversmooths an entity's scores over time (so small decimal distinctions are lost), and for some reason I get a gap between those entities at target (17 on the raw scale, 100 on the transformed scale, range on transformed scale is 0 to 100).

I need the values to match as closely as possible to the "mrya score" for the latest year, 2015 (see sample_data.csv). sample_data.csv

Any ideas much appreciated!

** EDIT ** Sample input/output data

code  raw value     year  transformed   beta transformed
                          (new method)  (old method)
840  9.534868563    2006    69.62   74.1163453
840  9.566570011    2007    69.62   73.07062613
840  9.568561874    2008    69.62   72.87069707
840  9.575458466    2009    69.62   72.66601396
840  9.618749298    2010    69.62   72.22070745
840  9.618945848    2011    69.62   72.22070745
840  9.622768849    2012    69.62   71.55126994
840  9.623843583    2013    69.62   72.26018915
840  9.623843583    2014    69.62   71.56642954
840  9.623843583    2015    69.62   71.56642954

As you can see above, even though the raw values are incrementally increasing, applying a qbeta(5,2) transform results in some transformed values decreasing through the time series (as shown in Figure 1, also in Column 5 "beta transformed (old method). But my attempted fix (from the code snippet above) generates a straight line and oversmooths differences, transforming the different raw values into the same score 69.62. I hope this clarifies my question!

  • So what exactly is your question? If you are asking for recommendations to properly transform your data, that's really more of a statistical question and should be asked at [stats.se]. If you know exactly what transformation you want but don't know how to code it, describe it clearly. Also provide sample input and desired output so solutions can be tested. (Include data in the question, don't require downloading of other sites. See [how to create a reproduicble example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) – MrFlick Aug 06 '16 at 03:35
  • I am set on using the qbeta transformation, but I've noticed that there are problems applying it over a time series. When applying the qbeta transformation to a particularly entity's data over time, the same raw untransformed datapoint may get different beta-transformed scores. I'd like for this not to happen, so in a way, to equate a raw value score with a beta-transformed score. Does this make sense? I've posted the sample input and output data above. – user6684565 Aug 06 '16 at 15:08

0 Answers0