1

I am currently using the Johnson package to transform some data (e.g. percentage of incorrectly recalled items and similar).

(https://cran.r-project.org/web/packages/Johnson/Johnson.pdf)

The NE.Johnson function works fine for most of my data apart from two variables. When trying to run the function I receive the following error:

>recall_Johnson <- RE.Johnson(RECALL)

Error in RE.ADT(xsl[, i]) : object 'p' not found

I am really hoping you guys can give me a hand because I can't work out why I am getting the error!

I have visually searched the data and I can't find anything that really makes the two problem variables stand. Below is a print out of the sorted data for one variable that works fine (percentage of inhibition errors) and another which does not work (percentage of recall errors).

WORKS FINE: Inhibition Errors

> sort.x=sort(Inhibpercent)
> print(sort.x)

  [1]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000
  [7]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000
 [13]  0.8403361  0.8474576  1.2605042  1.6666667  1.6666667  1.6666667
 [19]  1.6666667  1.6666667  1.6666667  1.6666667  1.6806723  1.6949153
 [25]  1.6949153  1.6949153  1.7021277  1.7241379  1.7241379  1.7543860
 [31]  2.0833333  2.1008403  2.1276596  2.5000000  2.5000000  2.5104603
 [37]  2.5423729  2.5531915  2.9166667  2.9166667  2.9166667  2.9166667
 [43]  2.9411765  2.9535865  3.0042918  3.3333333  3.3333333  3.3333333
 [49]  3.3333333  3.3613445  3.3755274  3.3755274  3.3898305  3.3898305
 [55]  3.4188034  3.4482759  3.7500000  3.7656904  3.7815126  3.7815126
 [61]  3.8297872  4.1666667  4.1841004  4.2016807  4.2194093  4.5833333
 [67]  4.6025105  4.6511628  4.6808511  5.0000000  5.0000000  5.0000000
 [73]  5.0209205  5.0847458  5.0847458  5.0847458  5.0847458  5.1282051
 [79]  5.4393305  5.5319149  5.8333333  6.6666667  6.6666667  6.7796610
 [85]  6.7796610  6.7796610  6.7796610  6.8965517  6.8965517  7.3033708
 [91]  8.3333333  8.3333333  8.3333333  8.3333333  8.3333333  8.3333333
 [97]  8.4388186  8.4745763  8.4745763  8.4745763  8.6206897  9.5833333
[103]  9.6638655 10.0000000 10.1694915 10.1694915 10.3448276 10.5042017
[109] 10.7142857 11.3924051 11.6666667 11.6666667 12.0833333 13.3333333
[115] 15.2542373 18.3333333 20.7547170 25.8536585

DOES NOT WORK: Recall Errors

> sort.x=sort(incorrectrecallpercent)
> print(sort.x)

  [1]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000
  [7]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000
 [13]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000
 [19]  0.0000000  0.0000000  0.0000000  0.0000000  0.4166667  0.4166667
 [25]  0.4201681  0.4201681  0.4201681  0.4219409  0.4255319  0.4255319
 [31]  0.4878049  0.8333333  0.8368201  0.8403361  0.8403361  0.8438819
 [37]  1.2500000  1.2500000  1.2500000  1.2605042  1.2875536  1.6666667
 [43]  1.6666667  1.6666667  1.6666667  1.6666667  1.6666667  1.6736402
 [49]  1.6949153  1.6949153  1.6949153  2.1186441  2.1186441  2.1186441
 [55]  2.5000000  2.5000000  2.5104603  2.5210084  2.5210084  2.9166667
 [61]  2.9787234  3.3333333  3.3333333  3.3898305  3.3898305  3.7500000
 [67]  3.7735849  3.7974684  4.1666667  4.1841004  4.6025105  5.0000000
 [73]  5.0000000  5.0847458  5.0847458  5.1724138  5.3571429  5.4621849
 [79]  5.5084746  5.5555556  6.6666667  6.6666667  6.7226891  6.7796610
 [85]  6.7796610  6.8965517  7.0175439  8.3333333  8.3333333  8.3333333
 [91]  8.3333333  8.3333333  8.3333333  8.3333333  8.3333333 10.0000000
 [97] 10.0000000 10.0000000 10.1694915 10.3448276 11.2359551 11.6666667
[103] 12.2807018 12.2881356 13.5593220 13.7931035 13.7931035 16.1016949
[109] 16.2790698 16.4556962 16.9491525 17.9487179 26.1603376 29.3103448
[115] 32.2033898 32.2033898 32.2033898 39.9159664

Apart from recall having slightly more instances of zero, I can't really see any substantial differences? But perhaps I am overlooking something?

I have tried following the guidance from this thread: Apply a loop to a data frame

It is suggested to modify this code (lines 101-103 in RE.Johnson.R):

if(xsb.valida[1,i]==0) xsb.adtest[1,i]<-(RE.ADT(xsb[,i])$p)
if(xsl.valida[1,i]==0) xsl.adtest[1,i]<-(RE.ADT(xsl[,i])$p)
if(xsu.valida[1,i]==0) xsu.adtest[1,i]<-(RE.ADT(xsu[,i])$p)

to:

if (xsb.valida[1, i] == 0 & any(xsb[, i]!=xsb[1, i])){
    xsb.adtest[1, i] <- (RE.ADT(xsb[, i])$p)
  }
else{ xsb.adtest[1, i] <- 0
 }   
if (xsl.valida[1, i] == 0 & any(xsl[, i]!=xsl[1, i])) {
xsl.adtest[1, i] <- (RE.ADT(xsl[, i])$p)
}
else{ xsl.adtest[1, i] <- 0
}
if (xsu.valida[1, i] == 0 & any(xsu[, i]!=xsu[1, i])) {
xsu.adtest[1, i] <- (RE.ADT(xsu[, i])$p)
}
else{xsu.adtest[1, i] <- 0
}

The reasoning behind the change is the same error can occur when the function tries to perform the transformation to a vector of equal values. I don't think that applies to my data because the values in the vector are not equal, but perhaps I have misunderstood? If the function breaks the vector into sections...perhaps this problem could occur because of the number of zeros in the beginning of the sorted list? I have tried changing the code, but unfortunately, it did solve the problem.

I see that the code below has been inserted to handle NA/NaN occurrences. It appears near the bottom of the script (lines 106-108 in NE.Johnson.R) and just under where I get the error (line 101 in NE.Johnson.R). Which perhaps could be relevant to the error I am getting?

# insertion by Boda Martin to handle with NA/NaN occurrences
xsb.adtest[which(is.na(xsb.adtest))] <- 0 
xsl.adtest[which(is.na(xsl.adtest))] <- 0 
xsu.adtest[which(is.na(xsu.adtest))] <- 0 

Any help, guidance or gentle nudging in the correct direction would be much appreciated.

Here is the link to download the package if you want to see the code: https://cran.r-project.org/web/packages/Johnson/index.html

Many thanks, Rachel

Rachel
  • 11
  • 2

1 Answers1

0

Wow. The source code for that package is fraught with peril!

I made the changes suggested in the SO link you provided and it does, indeed — and, in theory — solve your issue:

# install.packages("devtools")
devtools::install_github("hrbrmstr/Johnson")

library(Johnson)

c(0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000,
0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000,
0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000,
0.0000000, 0.4166667, 0.4166667, 0.4201681, 0.4201681, 0.4201681, 0.4219409,
0.4255319, 0.4255319, 0.4878049, 0.8333333, 0.8368201, 0.8403361, 0.8403361,
0.8438819, 1.2500000, 1.2500000, 1.2500000, 1.2605042, 1.2875536, 1.6666667,
1.6666667, 1.6666667, 1.6666667, 1.6666667, 1.6666667, 1.6736402, 1.6949153,
1.6949153, 1.6949153, 2.1186441, 2.1186441, 2.1186441, 2.5000000, 2.5000000,
2.5104603, 2.5210084, 2.5210084, 2.9166667, 2.9787234, 3.3333333, 3.3333333,
3.3898305, 3.3898305, 3.7500000, 3.7735849, 3.7974684, 4.1666667, 4.1841004,
4.6025105, 5.0000000, 5.0000000, 5.0847458, 5.0847458, 5.1724138, 5.3571429,
5.4621849, 5.5084746, 5.5555556, 6.6666667, 6.6666667, 6.7226891, 6.7796610,
6.7796610, 6.8965517, 7.0175439, 8.3333333, 8.3333333, 8.3333333, 8.3333333,
8.3333333, 8.3333333, 8.3333333, 8.3333333, 10.0000000, 0.0000000, 10.0000000,
10.1694915, 10.3448276, 11.2359551, 11.6666667, 2.2807018, 12.2881356,
13.5593220, 13.7931035, 13.7931035, 16.1016949, 6.2790698, 16.4556962,
16.9491525, 17.9487179, 26.1603376, 29.3103448, 2.2033898, 32.2033898,
32.2033898, 39.9159664) -> incorrectrecallpercent

str(RE.Johnson(incorrectrecallpercent))
#> List of 8
#>  $            : chr "Johnson Transformation"
#>  $ function   : chr "SB"
#>  $ p          : num 0.00058
#>  $ transformed: num [1:118] -1.08 -1.08 -1.08 -1.08 -1.08 ...
#>  $ f.gamma    : num 2.65
#>  $ f.lambda   : num 112
#>  $ f.epsilon  : num -0.921
#>  $ f.eta      : num 0.778

I'll leave the computational output veracity to you.

Of note: there is a second package on CRAN for performing the Johnson transformation:

library(jtrans)
library(nortest)

str(jtrans(incorrectrecallpercent, test="ad.test"))
## List of 10
##  $ original   : num [1:118] 0 0 0 0 0 0 0 0 0 0 ...
##  $ transformed: num [1:118] -1.09 -1.09 -1.09 -1.09 -1.09 ...
##  $ type       : chr "sl"
##  $ test       : chr "ad.test"
##  $ eta        : num 0.824
##  $ gamma      : num -1.06
##  $ lambda     : num NA
##  $ epsilon    : num -0.967
##  $ z          : num 0.26
##  $ p.value    : num 0.000397
##  - attr(*, "class")= chr [1:2] "sl" "jtrans"

Again, only you are going to know what is correct in terms of results.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205