0

I need to do the 2-steps analysis of the following data:

1 5 1 -2
2 6 3 4
1 5 4 -3
NA NA NA NA
2 5 4 -4

Step 1. Remove all NA rows (these are always whole rows, not cells) Step 2. Sort rows by the values of 4th column in descending order

The result should be the following:

2 6 3 4
1 5 1 -2
1 5 4 -3
2 5 4 -4

How can I efficiently do this processing, while considering that the data set might be large (e.g. 100,000 entries).

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
JoeBlack
  • 471
  • 2
  • 4
  • 13
  • Are the `NA`s are always present as a whole row and you need to remove all of them or some of the rows containing both NAs and and non-NAs and you don't want to remove these. Please clarify – David Arenburg Mar 22 '16 at 20:12
  • @David Arenburg: The NAs are always present as a whole row. – JoeBlack Mar 22 '16 at 20:24
  • See also this: http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns , and next time, please do a minimal search before posting – David Arenburg Mar 22 '16 at 20:27

1 Answers1

1

Another way to do this would be to first remove all NA values and then order the matrix.

# make a matrix
my_mat <- matrix(c(1,2,1,1,2,5,6,5,2,5,1,3,4,2,4,-2,4,-3,2,-4),
             nrow = 5, ncol = 4)

# add some NA values
my_mat[4,] <- NA

     [,1] [,2] [,3] [,4]
[1,]    1    5    1   -2
[2,]    2    6    3    4
[3,]    1    5    4   -3
[4,]   NA   NA   NA   NA
[5,]    2    5    4   -4


# remove rows that contain any number of NAs, for this purpose
# NAs always occupy the entire row as specified in the question
my_mat <- my_mat[complete.cases(my_mat),]

# order by the 4th column
my_mat[order(my_mat[,4], decreasing = TRUE),]

     [,1] [,2] [,3] [,4]
[1,]    2    6    3    4
[2,]    1    5    1   -2
[3,]    1    5    4   -3
[4,]    2    5    4   -4
mfidino
  • 3,030
  • 1
  • 9
  • 13
  • This will remove also the rows that not all `NA`s. Try `my_mat[5,3] <- NA ; my_mat[complete.cases(my_mat),]` – David Arenburg Mar 22 '16 at 19:54
  • @DavidArenburg OP didn't mention whether any rows with any `NA` will be all `NA` (and thus this will work) or whether he/she needs to only remove rows that are all-NA (there's where a hyphen would've helped in the question). My reading of the question suggested the same conclusion M_Fidino came to here. – TayTay Mar 22 '16 at 19:57
  • Completely true, but for this application it will work as `NA` values will be present through the entire row (as they specified in the question). – mfidino Mar 22 '16 at 19:57
  • In such case this is just plane copy/paste from these r-faqs [this](http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame) and [this](http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns) and should be just closed as a dupe instead of copy/pasting this into here – David Arenburg Mar 22 '16 at 20:04
  • @Tgsmith61591: Actually I mentioned it "Remove all NA rows **(these are always whole rows, not cells)**" – JoeBlack Mar 22 '16 at 20:25
  • @JoeBlack you misunderstood my point. That was what I was contending. – TayTay Mar 22 '16 at 20:27