1

I have a data frame of 4352 observations and 21 columns. First column is a date vector and the other 20 columns are numeric vectors (representing stock prices). Since on some days (i.e. weekends and holidays) there are no trades, therefore some observations have NA's in columns 2:21.

The following code shows me logical data frame indicating of there is NA and test data frame has the same dimensions as the input table.

test <- is.na(prices[, 2:21]) %>% as.data.frame()

However when I do the following, the result is 48052 observations with additional rows names e.g. NA.40755 etc.

test <- prices[is.na(prices[, 2:21]) == 0, ]

But when I use comma instead of colon when slicing columns it seems that I have the desired output (i.e. 2970 observations):

test <- prices[is.na(prices[, 2, 21]) == 0, ]

Therefore my question is why I have to slice [, 2, 21] instead of [, 2:21] ?

SlavicDoomer
  • 181
  • 1
  • 5

2 Answers2

1

is.na(prices[, 2:21]) is a logical matrix with TRUE/FALSE values. I am not sure what you were trying to do when comparing it == 0 because that would return logical matrix of same dimension. You need to consolidate all the row values together using rowSums so that you have only 1 value in each row.

If you want drop the rows with all NA values you can use :

prices <- prices[rowSums(!is.na(prices[, 2:21])) > 0, ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use Reduce with lapply from base R

prices <- prices[!Reduce(`&`, lapply(prices[2:21], is.na)),]
akrun
  • 874,273
  • 37
  • 540
  • 662