1

I have been working with R for about six months now, and so I am still somewhat of a novice with a lot of this. I have a large dataset of 260 columns with 1000 rows and I need to convert the data to standard deviation units and then removing outliers which do not meet the set SD criteria. I have managed to convert the data and remove the necessary rows; however, after doing this I need to convert the data back to its original values. The problem that I am facing is that when I do this it continuously throws up an error and I am not sure how to get past this. I am assuming that this is due to the dataset now being different in size than before I had standardised it, but I can't think of a way to work around this.

I have looked through past questions around this issue but I have not found anything that solves my problem and so any help regarding this issue would be greatly appreciated.

Here is a sample idea of what I am trying to do and what is failing

y = 30
C = 30
ds <- matrix(data = NA, nrow = y, ncol = C)

for (i in 1:y) {
  ds[i,] <- sample(1:100, C, TRUE)}

ds_z <- scale(ds, center = TRUE, scale = TRUE)
no_out <- ds_z[!rowSums(ds_z >2),]
revrs = t(apply(no_out, 1, function(r)r*attr(no_out,'scaled:scale') + attr(no_out, 'scaled:center')))
jdtrulson
  • 15
  • 3
  • You may have missed the `,` `no_out <- ds_z[!rowSums(ds_z >2),]` (assuming you want to subset the rows) – akrun Dec 03 '22 at 22:37
  • In addition, you are using the whole attributes while looping. May be you want `i1 <- !rowSums(ds_z > 2); no_out <- ds_z[i1, ]; lst1 <- lapply(attributes(ds_z)[-1], \(x) x[i1]);no_out2 <- (no_out * lst1$`scaled:scale`) + lst1$`scaled:center`; no_out2 <- round(no_out2)` – akrun Dec 03 '22 at 22:53
  • Yes, you are right; I had forgotten to add the "," in the above example, unfortunately, this wasn't the issue, though. I tried using your model but I received the error " in lst1$scaled:scale : argument of length 0", and I couldn't figure out how to solve this – jdtrulson Dec 04 '22 at 00:51
  • There was a quote around the `"scaled:scale"` and `"scaled:center"` – akrun Dec 04 '22 at 17:11
  • Using this method also seems to throw up an error saying, " argument length 0". **'i1 <- !rowSums(ds_z > 2); no_out <- ds_z[i1, ]; lst1 <- lapply(attributes(ds_z)[-1], \(x) x[i1]);no_out2 <- (no_out * lst1$scaled:scale) + lst1$scaled:center; no_out2 <- round(no_out2)'** – jdtrulson Dec 04 '22 at 22:16
  • Sorry, I didn't understand your comment. I meant `lst1$"scaled:scale"` and `lst1$"scaled:center"` – akrun Dec 04 '22 at 22:17
  • Thanks, that worked perfectly; just what I was looking for, although I can't say I understand how the 'apply(attributes(ds_z)[-1], \(x) x[i1])' portion of the code is working, but it works great :) – jdtrulson Dec 04 '22 at 22:21
  • In your code, you are looping over each row of `ds_z` while within the loop, extract the full attributes – akrun Dec 04 '22 at 22:22
  • Is there a way to keep the values as floats and not have them converted into integers? Sorry to extend this conversation – jdtrulson Dec 04 '22 at 22:41
  • Just remove the last step i.e. `no_out2 <- round(no_out2)` (you don't need to round them) – akrun Dec 04 '22 at 22:42

1 Answers1

0

Try

i1 <- !rowSums(ds_z > 2)
no_out <- ds_z[i1, ]
 lst1 <- lapply(attributes(ds_z)[-1], \(x) x[i1])
no_out2 <- (no_out * lst1$`scaled:scale`) +  lst1$`scaled:center`
 no_out2 <- round(no_out2) 
akrun
  • 874,273
  • 37
  • 540
  • 662