Dropping all left NAs in a dataframe and left shifting the cleaned rows

Question

I have the following dataframe dat, which presents a row-specific number of NAs at the beginning of some of its rows:

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat

#  V1 V2 V3 V4 V5 V6 V7 V8
#  NA NA  1  3  5 NA NA NA
#  NA  1  2  3  6  7  8 NA
#   1 NA  2  3  4  5  6 NA

My aim is to delete all the NAs at the beginning of each row and to left shift the row values (adding NAs at the end of the shifted rows accordingly, in order to keep their length constant).

The following code works as expected:

for (i in 1:nrow(dat)) {

    if (is.na(dat[i,1])==TRUE) {
        dat1 <- dat[i, min(which(!is.na(dat[i,]))):length(dat[i,])]
        dat[i,]  <- data.frame( dat1, t(rep(NA, ncol(dat)-length(dat1))) )
    }

}

dat

returning:

#  V1 V2 V3 V4 V5 V6 V7 V8
#   1  3  5 NA NA NA NA NA
#   1  2  3  6  7  8 NA NA
#   1 NA  2  3  4  5  6 NA

I was wondering whther there is a more direct way to do so without using a for-loop and by using the tail function.

With respect to this last point, by using min(which(!is.na(dat[1,]))) the result is 3, as expected. But then if I type tail(dat[1,],min(which(!is.na(dat[1,])))) the result is the same initial row, and I don't understand why..

Thank you very much for anu suggestion.

Is it just by coincidence that the non-`NA` values per row are sorted in increasing order from left to right? Or is that what you are trying to do (with all `NA`s at the right)? — talat, May 14 '14 at 12:28
It is by coincidence, the non missing entries could take any value. The crucial part is that if I have NAs on the left (starting from the first column) I need to get rid of all of them. Thanks — Stefano Lombardi, May 14 '14 at 12:37

score 7 · Answer 1 · answered May 14 '14 at 13:01

if you just want all NA's to be pushed to the end, you could try

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat[3,2] <- NA
> dat
  V1 V2 V3 V4 V5 V6 V7 V8
1 NA NA  1  3  5 NA NA NA
2 NA  1  2  3  6  7  8 NA
3  1 NA  3  4  5  6  7 NA
dat.new<-do.call(rbind,lapply(1:nrow(dat),function(x) t(matrix(dat[x,order(is.na(dat[x,]))])) ))
colnames(dat.new)<-colnames(dat)
> dat.new
     V1 V2 V3 V4 V5 V6 V7 V8
[1,] 1  3  5  NA NA NA NA NA
[2,] 1  2  3  6  7  8  NA NA
[3,] 1  3  4  5  6  7  NA NA

score 4 · Accepted Answer · answered May 14 '14 at 12:39

4

I don't think you can do this without a loop.

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat[3,2] <- NA

#   V1 V2 V3 V4 V5 V6 V7 V8
# 1 NA NA  1  3  5 NA NA NA
# 2 NA  1  2  3  6  7  8 NA
# 3  1 NA  3  4  5  6  7 NA

t(apply(dat, 1, function(x) {
  if (is.na(x[1])) {
    y <- x[-seq_len(which.min(is.na(x))-1)]
    length(y) <- length(x)
    y
  } else x
}))

#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,]    1    3    5   NA   NA   NA   NA   NA
#[2,]    1    2    3    6    7    8   NA   NA
#[3,]    1   NA    3    4    5    6    7   NA

Then turn the matrix into a data.frame if you must.

answered May 14 '14 at 12:39

Roland

127,288
10
191
288

Thank you very much, but this does not answer to the initial question. Can you suggest a way to use the `tail` function? Are you sure it is not possible to use `tail` in combination with one of the `apply` family functions? – Stefano Lombardi May 14 '14 at 12:54
You can use `tail` instead of `y <- x[-seq_len(which.min(is.na(x))-1)]`, but that doesn't offer any advantage. – Roland May 14 '14 at 12:57
Thanks for the help. The problem with `tail` was that I had to define `dat[i,]` as integer. Regards, S. – Stefano Lombardi May 15 '14 at 09:29

Stefano Lombardi · Answer 3 · 2014-05-15T09:37:11.173

Here there is the answer by using the tail function:

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat

        for (i in 1:nrow(dat)) {

            if (is.na(dat[i,1])==TRUE) {

              # drops initial NAs of the row (if the sequence starts with NAs)
                dat1 <- tail(as.integer(dat[i,]), -min(which(!is.na(dat[i,]))-1))

              # adds final NAs to keep the row length constant (i.e. conformable with 'dat')
                length(dat1) <- ncol(dat) 

              dat[i,] <- dat1

            }

        }

dat

Dropping all left NAs in a dataframe and left shifting the cleaned rows

3 Answers3

Linked