I have a data.table
. I want to remove those rows where all columns except certain 2 columns are NA. For example:
I have a data.table like:
> ww2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1: 5.1 3.5 1.4 0.2 setosa 1
2: 4.9 3.0 1.4 0.2 setosa 2
3: 4.7 3.2 1.3 0.2 setosa 3
4: 4.6 3.1 1.5 0.2 setosa 4
5: 5.0 3.6 1.4 0.2 setosa 5
6: 5.1 3.5 1.4 0.2 dffdsdf 1
7: 4.9 3.0 1.4 0.2 dffdsdf 2
8: 4.7 3.2 1.3 0.2 dffdsdf 3
9: NA NA NA NA dffdsdf 4
10: NA NA NA NA dffdsdf 5
Its dput is:
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.1, 4.9,
4.7, NA, NA), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.5, 3,
3.2, NA, NA), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.4,
1.4, 1.3, NA, NA), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, NA, NA), Species = structure(c(1L, 1L, 1L, 1L, 1L,
4L, 4L, 4L, 4L, 4L), class = "factor", .Label = c("setosa", "versicolor",
"virginica", "dffdsdf")), index = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L)), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length",
"Petal.Width", "Species", "index"), row.names = c(NA, -10L), class = "data.frame")
In above data table I want to remove row number 9 and 10. Since my actual data table is really big and has a lot more columns, it is difficult to explicitly mention those columns which are NA. But the columns which are not NA are fixed (they are 2, and in this particular example they are index
and Species
.
I am looking for an efficient and fast solution to this.