0

I have a dataframe that looks like this with column names as dates;

2013_11 | 2013_12 | 2014_01 | 2014_02 | 2014_03 |

 NA | NA | 3  | 3  | N  |
  2 | 2  | 3  | NA | NA |
 NA | NA | NA | NA | NA |

I need to write some sort of logic function that will filter out only the rows I am looking for. I need to pull only rows that did NOT have a number for any month in 2013 (first two columns), but DID have at least 1 number in any of the 2014 columns.

So the code would only pull back the first row for me;

NA | NA | 3  | 3  | N  |

I can't figure out the most efficient way to do this, as I have about 8 million rows.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
chattrat423
  • 603
  • 2
  • 11
  • 24

3 Answers3

2

You could try

indx1 <- grep('2013', colnames(df))
indx2 <- grep('2014', colnames(df))
df[!rowSums(!is.na(df[indx1]))&!!rowSums(!is.na(df[indx2])),]
#   2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

Or you could use

i1 <- Reduce(`&`, lapply(df[indx1], function(x) is.na(x)))
i2 <- Reduce(`&`, lapply(df[indx2], function(x) !is.na(x)))
df[i1 &i2,]
# 2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

data

df <- structure(list(`2013_11` = c(NA, 2L, NA), `2013_12` = c(NA, 2L, 
NA), `2014_01` = c(3L, 3L, NA), `2014_02` = c(3L, NA, NA), `2014_03` = c("N", 
NA, NA)), .Names = c("2013_11", "2013_12", "2014_01", "2014_02", 
"2014_03"), class = "data.frame", row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Have you considered using grep. I would create a function to do this such as the following. Using R's any, all, is.na and if statements in a for loop.

grep_function <- function(src, condition1, condition2) {
    for(i in 1:length(src[[1]])){
        data_condition1 <- src[i, grepl(condition1, names(src))]
        data_condition2 <- src[i, grepl(condition2, names(src))]
        if(all(is.na(data_condition1) && any(!is.na(data_condition2)))) {
            // do something here to each individual observation
        } else {
            // do something for those that do not meet your criterea
        }
    }
}

Example: grep_function(your-data-here, "2013", "2014")

Daniel Robertson
  • 1,354
  • 13
  • 22
0

Or you could use SQL (it is a little bit verbose, but maybe more readable for some people):

require('sqldf')

a=data.frame("2013_11"=c(NA,2,NA), "2013_12"=c(NA,2,NA), "2014_01" =c(3,3,NA),
             "2014_02" =c(3,NA,NA) ,"2014_03" =c(NA,NA,NA))

sqldf("select * from a where 
        case when X2013_11 is null then 0 else 1 end +
        case when X2013_12 is null then 0 else 1 end = 0 
        and
        case when X2014_01 is null then 0 else 1 end +
        case when X2014_02 is null then 0 else 1 end +
        case when X2014_03 is null then 0 else 1 end > 0
      ")

 X2013_11 X2013_12 X2014_01 X2014_02 X2014_03
       NA       NA        3        3       NA
arturro
  • 1,598
  • 1
  • 10
  • 13