2

I want to get a subset of my dataframe by keeping rows that have numeric in all columns so

>small
     0    16h    24h    48h
ID1  1    0      0   
ID2  453  254    21     12  
ID3  true  3     2      1
ID4  65    23    12     12

would be

>small_numeric
     0    16h    24h    48h  
ID2  453  254    21     12  
ID4  65    23    12     1

I tried

sapply(small, is.numeric)

but got this

0      16h    24h    48h   
FALSE  FALSE  FALSE  FALSE 
Jaap
  • 81,064
  • 34
  • 182
  • 193
marie
  • 63
  • 9

1 Answers1

5

Using:

small[!rowSums(is.na(sapply(small, as.numeric))),]

gives:

      0 16h 24h 48h
ID2 453 254  21  12
ID4  65  23  12  12

What this does:

  • With sapply(small, as.numeric) you force all columns to numeric. Non-numeric values are converted to NA-values as a result.
  • Next you count the number of NA-values with rowSums(is.na(sapply(small, as.numeric))) which gives you back a numeric vector, [1] 1 0 1 0, with the number of non-numeric values by row.
  • Negating this with ! gives you a logical vector of the rows where all columns have numeric values.

Used data:

small <- read.table(text="     0    16h    24h    48h
ID1  1    0      0     
ID2  453  254    21     12  
ID3  true  3     2      1
ID4  65    23    12     12", header=TRUE, stringsAsFactors = FALSE, fill = TRUE, check.names = FALSE)

For the updated example data, the problem is that columns with non-numeric values are factors instead of character. There you'll have to adapt the above code as follows:

testdata[!rowSums(is.na(sapply(testdata[-1], function(x) as.numeric(as.character(x))))),]

which gives:

      0  16h  24h  48h   NA
ID2 ID2   46   23   23   48
ID3 ID3   44   10   14   22
ID4 ID4   17   11    4   24
ID5 ID5   13    5    3   18
ID7 ID7 4387 4216 2992 3744

Extra explanation:

  • When converting factor-columns to numeric, you will have to convert those to character first. Hence: as.numeric(as.character(x)). If you don't do that, as.numeric with give back the numbers of the factor levels.
  • I used testdata[-1] as I supposed that you didn't want to include the first column in the check for numeric values.
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Still getting all rows (unfiltered) – marie Jul 07 '17 at 15:29
  • @marie It works on the example data. If it doesn't work on your real data, please include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) in your question, so I can see why it doesn't work. – Jaap Jul 07 '17 at 15:32
  • > dput(head(testdata,7)) structure(list(`0` = structure(1:7, .Label = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7"), class = "factor"), `16h` = structure(c(1L, 6L, 5L, 3L, 2L, 7L, 4L), .Label = c("1", "13", "17", "4387", "44", "46", "true"), class = "factor"), `24h` = c(0L, 23L, 10L, 11L, 5L, 13L, 4216L), `48h` = c(0L, 23L, 14L, 4L, 3L, 24L, 2992L ), `NA` = c(NA, 48L, 22L, 24L, 18L, 2L, 3744L)), .Names = c("0", "16h", "24h", "48h", NA), row.names = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7"), class = "data.frame") – marie Jul 07 '17 at 16:29
  • This is great, thank you so much for the explanation, it all makes much more sense now. I have now used read.csv with stringsAsFactors=FALSE which avoids using the more complicated code further down the line. – marie Jul 07 '17 at 20:35