3

I have a vector:

a <- c(NA,1:5,NA,NA,1:3, rep(NA,round(runif(1,0,100))))

I need to remove the trailing NAs. Desired result:

c(NA, 1:5, NA, NA, 1:3)
Márcio Mocellin
  • 274
  • 5
  • 18
  • Related: [Remove leading and trailing NA](https://stackoverflow.com/questions/42759027/remove-leading-and-trailing-na) – Henrik Feb 02 '21 at 00:00

6 Answers6

4

You can do

a[1:max(which(!is.na(a)))]
# [1] NA  1  2  3  4  5 NA NA  1  2  3

We subset the vector from position 1 to the last non NA value.

markus
  • 25,843
  • 5
  • 39
  • 58
3

One option would be

a[rev(cumprod(rev(is.na(a)))) == 0]
# [1] NA  1  2  3  4  5 NA NA  1  2  3

Here are the steps:

(a <- c(NA, 1:5, NA, NA, 1:3, NA, NA))
# [1] NA  1  2  3  4  5 NA NA  1  2  3 NA NA
is.na(a)
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE
rev(is.na(a))
# [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
cumprod(rev(is.na(a)))
# [1] 1 1 0 0 0 0 0 0 0 0 0 0 0
rev(cumprod(rev(is.na(a))))
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
2

You can find the maximum position which is not an NA and subset accordingly

> a[1:max(which(!is.na(a)))]
 [1] NA  1  2  3  4  5 NA NA  1  2  3
CT Hall
  • 667
  • 1
  • 6
  • 27
1

Also a possibility:

a[cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)]

 [1] NA  1  2  3  4  5 NA NA  1  2  3

In idividual steps:

is.na(a)

 [1]  TRUE FALSE FALSE FALSE FALSE

cumsum(!is.na(a))

 [1] 0 1 2 3 4

cumsum(!is.na(a)) != max(cumsum(!is.na(a)))

 [1]  TRUE  TRUE  TRUE  TRUE  TRUE

cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)

 [1]  TRUE  TRUE  TRUE  TRUE  TRUE

Just for fun, a little benchmarking:

library(microbenchmark)

a <- rep(a, 1e5)

 microbenchmark(
 markus = a[1:max(which(!is.na(a)))],
 Julius_Vainora = a[rev(cumprod(rev(is.na(a)))) == 0],
 Kim = rm_NA_tail(a),
 tmfmnk = a[cumsum(!is.na(a)) != max(cumsum(!is.na(a))) * is.na(a)],
 nsinghs = a[1:(length(a) - rle(is.na(rev(a)))$lengths[1])],
 times = 5
)

Unit: milliseconds
           expr      min       lq     mean   median       uq       max neval cld
         markus 150.7346 153.0674 156.4194 153.3031 159.4718  165.5201     5 a  
 Julius_Vainora 393.8520 418.8186 616.3269 703.4022 749.6600  815.9018     5  bc
            Kim 370.7680 382.1826 536.0828 632.0031 642.1882  653.2720     5  bc
         tmfmnk 390.2626 415.2378 466.4245 415.8310 423.3828  687.4082     5  b 
        nsinghs 537.0404 781.1403 798.6929 793.1027 842.6777 1039.5033     5   c
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
0

I think this works:

rm_NA_tail <- function(a) {
  if (is.na(a[length(a)])) {
    return(a[is.na(match(data.table::rleid(a), max(data.table::rleid(a))))])
  } else {
    return(a)
  }
}
Kim
  • 4,080
  • 2
  • 30
  • 51
0

This can be done using rle()

a[1:(length(a) - rle(is.na(rev(a)))$lengths[1])]
#  [1] NA  1  2  3  4  5 NA NA  1  2  3

rle(is.na(rev(a)))$lengths[1] gets the count of trailing NA in the vector, then subtract it from the total vector length to get the index up to which you want to keep the vector.

cropgen
  • 1,920
  • 15
  • 24