6

I have the following dataframe (s):

s<-read.table(text = "V1    V2  V3  V4  V5  V6  V7  V8  V9  V10 
  1 0   62  64  44  NA  55  81  66  57  53  
  2 0   0   65  50  NA  56  79  69  52  55  
  3 0   0   0   57  NA  62  84  76  65  59  
  4 0   0   0   0   NA  30  70  61  41  36  
  5 0   0   0   0   NA  NA  NA  NA  NA  NA  
  6 0   0   0   0   0   0   66  63  51  44  
  7 0   0   0   0   0   0   0   80  72  72  
  8 0   0   0   0   0   0   0   0   68  64  
  9 0   0   0   0   0   0   0   0   0   47  
  10    0   0   0   0   0   0   0   0   0   0   ", header = TRUE)

As can be seen row 5 and column 5 in this case includes only NA and 0 values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed. The final result would be:

    V1  V2  V3  V4  V6  V7  V8  V9  V10 
1   0   62  64  44  55  81  66  57  53  
2   0   0   65  50  56  79  69  52  55  
3   0   0   0   57  62  84  76  65  59  
4   0   0   0   0   30  70  61  41  36  
6   0   0   0   0   0   66  63  51  44  
7   0   0   0   0   0   0   80  72  72  
8   0   0   0   0   0   0   0   68  64  
9   0   0   0   0   0   0   0   0   47  
10  0   0   0   0   0   0   0   0   0   

Is there a way to get the omitted row and column number (in this case 5), as well?

Avi
  • 2,247
  • 4
  • 30
  • 52
  • 1
    so what is the minimum nr of NA that would justify dumping a Row/column. Do all the Non-NA values have to be 0 to drop? – Serban Tanasa Apr 26 '16 at 18:57
  • As can be seen it is an upper triangle matrix. In each case the NA will be for the rows: from line number column to last column (end). And for the same column number: from the first line till the same row number (5 in this example) – Avi Apr 26 '16 at 19:01
  • This is probably obvious, but: you should use a matrix, not a data.frame. – Frank Apr 26 '16 at 19:28
  • I do use matrix. I would be glad if you can show an answer with input of matrix without any need to convert to dataframe. – Avi Apr 26 '16 at 19:34
  • Is there a way to get the omitted row and column (in this case 5)? – Avi Apr 26 '16 at 19:41

4 Answers4

4

You have to define more on when exactly you want to drop. In this case it looks like matrix at one side and diagonal always being 0.

However, In general, this is what I use

s[!rowSums(is.na(s))>1,!colSums(is.na(s))>1]

Considering 0's

s[!rowSums(is.na(s)|s==0)>9,!colSums(is.na(s)|s==0)>9]
Ananta
  • 3,671
  • 3
  • 22
  • 26
3

We can try

v1 <- colSums(is.na(s))
v2 <- colSums(s==0, na.rm=TRUE)
j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)

v3 <- rowSums(is.na(s))
v4 <- rowSums(s==0, na.rm=TRUE)
i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
s[i1, j1]
#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 84 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

Suppose if we change one of the values in 's'

 s$V7[3] <- NA

By running the above code, the output will be

#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 NA 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them

akrun
  • 874,273
  • 37
  • 540
  • 662
3

I was going to suggest:

sclean <- s[rowSums(s == 0|is.na(s)) != ncol(s) | (rowSums(s == 0, na.rm=TRUE) == ncol(s)),
        colSums(s == 0|is.na(s) )!= nrow(s) | colSums(s == 0, na.rm=TRUE) == nrow(s)]
Serban Tanasa
  • 3,592
  • 2
  • 23
  • 45
  • I don't think that is correct as the answer is based on NA values greater than 1. It could have more than one NAs with non-NA (other than 0s). For an example, if we do `s$V7[3] <- NA`, then it omits that column while my solution keeps it. – akrun Apr 26 '16 at 19:19
  • 1
    @Avi Added condition to keep 0 columns. – Serban Tanasa Apr 26 '16 at 19:43
  • @akrun, my solution seems to keep v7 in your test case. – Serban Tanasa Apr 26 '16 at 19:46
1

You could try the following:

myRowSums <- rowSums(is.na(s) | s == 0)
myColSums <- colSums(is.na(s) | s == 0)

sSmall <- s[which(myRowSums != ncol(s)), which(myColSums != nrow(s))]

It works for the following dataset to drop all columns and rows that are entirely made up of 0s and NAs.

s <- data.frame(a=c(0, rnorm(5), 0), b=c(0, rnorm(2), NA, NA,1, NA), c=c(rep(c(0,NA), 3), 0))
lmo
  • 37,904
  • 9
  • 56
  • 69