1

hopefully I'm missing no answer already given her, but nothing I found seems to fit my problem.

I want to filter a 3D-array for certain outlier values (x > .300, x < 3). I have a structure like this here, with 30 slices for each subject with 2 columns for reaction times and a integer as choice:

n_subjects <- 30
n_obs <- 250

dat <- array(NaN, dim=c(n_subjects, 2, n_obs))

dat[, 1, ] <- rexp(n_obs, 1)
dat[, 2, ] <- round(runif(n_obs, 1, 5), 0)

My first approach was to use which(),it seems to work, but gives me an unstruct

dat[which(dat[, 1, ] > 0.3 & dat[, 1, ] < 3)]

which returns filtered values, but as a vector, losing the dimensional structure.

dat[which(dat[, 1, ] > 0.3 & dat[, 1, ] < 3)]
   [1] 1.44154641 0.52122836 0.75427634 0.72299465 0.52707838 0.58455269 1.01634364 0.68883200 1.15541663 1.69872059
  [11] 0.57827779 0.33754890 0.91186386 1.81258378 0.79937850 1.19459413 1.19862926 3.00000000 3.00000000 4.00000000
  [21] 3.00000000 3.00000000 3.00000000 3.00000000 2.00000000 2.00000000 4.00000000 5.00000000 5.00000000 4.00000000
  [31] 1.00000000 2.00000000 5.00000000 3.00000000 2.00000000 4.00000000 2.00000000 0.09598808 2.26378860 1.65597480
  [41] 0.97012070 1.97571758 0.56615487 0.58112680 3.74780963 1.13583855 3.11409406 0.22472111 0.44761366 4.95403062
  [51] 5.66179472 0.18718267 0.69218598 0.81050307 0.35018347 0.05329958 0.23688262 0.42126038 1.16712480 2.21866501
  [61] 1.00000000 5.00000000 2.00000000 3.00000000 4.00000000 4.00000000 4.00000000 4.00000000 1.00000000 1.00000000
  [71] 1.00000000 3.00000000 2.00000000 2.00000000 4.00000000 3.00000000 5.00000000 5.00000000 2.00000000 0.40950035
  [81] 0.70376002 2.33855435 0.81855408 1.16949376 1.50400404 2.71781548 0.71850858 0.90908760 0.24212159 0.02377835
  [91] 0.15044300 0.24012386 1.00252243 0.78028357 3.50965326 0.52697154 1.54606865 0.66357898 0.76511035 0.37248749
 [101] 1.00000000 4.00000000 2.00000000 4.00000000 4.00000000 3.00000000 3.00000000 1.00000000 4.00000000 2.00000000

I need to preserve the assignment of each value pair in the initial array. Is there any way, to do this with base R?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I'm wondering: If you want to preserve the initial structure, what should happen to the values you filter? Should they be replaced? Otherwise filtering out values implies loosing the structure because you have less values. – TimTeaFan Mar 01 '23 at 10:40
  • yeah thats true ! I was thinking about it like a dataframe, that I can just remove the cases (e.g. like in long format) from the data. That would imply that not every subject has the same number of observations, what would be ok for my modeling purpose. I actually don't know, whether my model can handle NA's, I have to check that! – Jan Göttmann Mar 01 '23 at 10:53
  • @JanGöttmann You could convert the array to a list, see my [updated answer](https://stackoverflow.com/a/75602706/6574038) below. – jay.sf Mar 01 '23 at 11:03

2 Answers2

0

Given that you want to "preserve the original structure" of your array, I think the only way to filter values would be by replacing them with NA so that the structure is preserved:

dat[dat <= 0.3 | dat >= 3] <- NA

# same structure
dim(dat)
#> [1]  30   2 250

# for printing
head(dat[,,1])
#>           [,1] [,2]
#> [1,] 0.8434573    2
#> [2,] 0.5766103   NA
#> [3,] 1.3290549   NA
#> [4,]        NA    2
#> [5,]        NA   NA
#> [6,] 0.3165012   NA

Data from OP

n_subjects <- 30
n_obs <- 250

dat <- array(NaN,dim=c(n_subjects,2,n_obs))

set.seed(123)

dat[,1,] <- rexp(250,1)
dat[,2,] <- round(runif(250,1,5),0)

Created on 2023-03-01 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • Ok I see the point on that ! But on this approach, we apply the filter only on one variable, resulting in eg. #> [4,] NA 2 where dat[,1,4] is NA, but dat[,2,4] is 2. So I could either use a conditional loop to remove both values? Anyway, I think what I need is to drop alll subjects, which match the condtion, so in the end I would have a 25x2x250. @TimTeaFan – Jan Göttmann Mar 01 '23 at 11:00
0

You can't delete rows from an array, since it would disrupt it's structure,, but you can set values to NA, using

dat[, 1, ][dat[, 1, ] <= 0.3 | dat[, 1, ] >= 3] <- NA_real_
dat
# , , 1
# 
#         [,1] [,2]
# [1,] 1.27194    2
# [2,]      NA    2
# [3,]      NA    5
# [4,] 1.27194    2
# 
# , , 2
# 
#         [,1] [,2]
# [1,]      NA    2
# [2,]      NA    5
# [3,] 1.27194    2
# [4,]      NA    2
# 
# , , 3
# 
#         [,1] [,2]
# [1,]      NA    5
# [2,] 1.27194    2
# [3,]      NA    2
# [4,]      NA    5

or

dat[, 1, ][dat[, 1, ] > 0.3 & dat[, 1, ] < 3] <- NA_real_
dat
# , , 1
# 
#           [,1] [,2]
# [1,]        NA    2
# [2,] 3.2470325    2
# [3,] 0.1767256    5
# [4,]        NA    2
# 
# , , 2
# 
#           [,1] [,2]
# [1,] 3.2470325    2
# [2,] 0.1767256    5
# [3,]        NA    2
# [4,] 3.2470325    2
# 
# , , 3
# 
#           [,1] [,2]
# [1,] 0.1767256    5
# [2,]        NA    2
# [3,] 3.2470325    2
# [4,] 0.1767256    5

Alternatively,

you could convert the array to a list1, which is subset-able.

lapply(seq_len(dim(dat)[3]), \(i) dat[,,i]) |>
  lapply(\(.) .[.[, 1] > 0.3 & .[, 1] < 3,,drop=FALSE])
# [[1]]
#         [,1] [,2]
# [1,] 1.27194    2
# [2,] 1.27194    2
# 
# [[2]]
#         [,1] [,2]
# [1,] 1.27194    2
# 
# [[3]]
#         [,1] [,2]
# [1,] 1.27194    2

Data:

dat <- set.seed(313962)
dat[, 1, ] <- rexp(n_obs, 1)
dat[, 2, ] <- round(runif(n_obs, 1, 5), 0)
jay.sf
  • 60,139
  • 8
  • 53
  • 110