5

I have the following dataframe dat, which presents a row-specific number of NAs at the beginning of some of its rows:

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat

#  V1 V2 V3 V4 V5 V6 V7 V8
#  NA NA  1  3  5 NA NA NA
#  NA  1  2  3  6  7  8 NA
#   1 NA  2  3  4  5  6 NA

My aim is to delete all the NAs at the beginning of each row and to left shift the row values (adding NAs at the end of the shifted rows accordingly, in order to keep their length constant).

The following code works as expected:

for (i in 1:nrow(dat)) {

    if (is.na(dat[i,1])==TRUE) {
        dat1 <- dat[i, min(which(!is.na(dat[i,]))):length(dat[i,])]
        dat[i,]  <- data.frame( dat1, t(rep(NA, ncol(dat)-length(dat1))) )
    }

}

dat

returning:

#  V1 V2 V3 V4 V5 V6 V7 V8
#   1  3  5 NA NA NA NA NA
#   1  2  3  6  7  8 NA NA
#   1 NA  2  3  4  5  6 NA

I was wondering whther there is a more direct way to do so without using a for-loop and by using the tail function.

With respect to this last point, by using min(which(!is.na(dat[1,]))) the result is 3, as expected. But then if I type tail(dat[1,],min(which(!is.na(dat[1,])))) the result is the same initial row, and I don't understand why..

Thank you very much for anu suggestion.

Stefano Lombardi
  • 1,581
  • 2
  • 22
  • 48
  • Is it just by coincidence that the non-`NA` values per row are sorted in increasing order from left to right? Or is that what you are trying to do (with all `NA`s at the right)? – talat May 14 '14 at 12:28
  • It is by coincidence, the non missing entries could take any value. The crucial part is that if I have NAs on the left (starting from the first column) I need to get rid of all of them. Thanks – Stefano Lombardi May 14 '14 at 12:37

3 Answers3

7

if you just want all NA's to be pushed to the end, you could try

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat[3,2] <- NA
> dat
  V1 V2 V3 V4 V5 V6 V7 V8
1 NA NA  1  3  5 NA NA NA
2 NA  1  2  3  6  7  8 NA
3  1 NA  3  4  5  6  7 NA
dat.new<-do.call(rbind,lapply(1:nrow(dat),function(x) t(matrix(dat[x,order(is.na(dat[x,]))])) ))
colnames(dat.new)<-colnames(dat)
> dat.new
     V1 V2 V3 V4 V5 V6 V7 V8
[1,] 1  3  5  NA NA NA NA NA
[2,] 1  2  3  6  7  8  NA NA
[3,] 1  3  4  5  6  7  NA NA
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17
4

I don't think you can do this without a loop.

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat[3,2] <- NA

#   V1 V2 V3 V4 V5 V6 V7 V8
# 1 NA NA  1  3  5 NA NA NA
# 2 NA  1  2  3  6  7  8 NA
# 3  1 NA  3  4  5  6  7 NA

t(apply(dat, 1, function(x) {
  if (is.na(x[1])) {
    y <- x[-seq_len(which.min(is.na(x))-1)]
    length(y) <- length(x)
    y
  } else x
}))

#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,]    1    3    5   NA   NA   NA   NA   NA
#[2,]    1    2    3    6    7    8   NA   NA
#[3,]    1   NA    3    4    5    6    7   NA

Then turn the matrix into a data.frame if you must.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thank you very much, but this does not answer to the initial question. Can you suggest a way to use the `tail` function? Are you sure it is not possible to use `tail` in combination with one of the `apply` family functions? – Stefano Lombardi May 14 '14 at 12:54
  • You can use `tail` instead of `y <- x[-seq_len(which.min(is.na(x))-1)]`, but that doesn't offer any advantage. – Roland May 14 '14 at 12:57
  • Thanks for the help. The problem with `tail` was that I had to define `dat[i,]` as integer. Regards, S. – Stefano Lombardi May 15 '14 at 09:29
0

Here there is the answer by using the tail function:

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat

        for (i in 1:nrow(dat)) {

            if (is.na(dat[i,1])==TRUE) {

              # drops initial NAs of the row (if the sequence starts with NAs)
                dat1 <- tail(as.integer(dat[i,]), -min(which(!is.na(dat[i,]))-1))

              # adds final NAs to keep the row length constant (i.e. conformable with 'dat')
                length(dat1) <- ncol(dat) 

              dat[i,] <- dat1

            }

        }

dat
Stefano Lombardi
  • 1,581
  • 2
  • 22
  • 48