I have a dataframe (data) which includes a lot of dates. I want to lop off everything from before 1970. I can create a list of indices that are before 1970:
tmp <- which(data$data < '1970-01-01')
[1] 13446 102876 141199
and I want to create a new table that drops out those three rows. Something like:
data.after.1970 <- data[!tmp, ]
I know I could create a vector of all the incidents after 1970 and match against it with:
tmp <- which(data$data > '1970-01-01')
data.after.1970 <- data[tmp, ]
But I am wondering what syntax I would use to exclude items.
UPDATE
I finally just did this:
tmp <- which(data$data > as.Date('1970-01-01'))
data.after.1970 <- data[tmp, ]
and took a closer look at it. which(data$data < as.Date('1970-01-01'))
gets three results, but nrow(data) - nrow(data.after.1970)
shows that I dropped 45 rows. summary(datae$date)
cleared that up:
summary(data$date)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
"1933-07-01" "1989-01-25" "1992-07-09" "1992-05-03" "1996-06-10" "2006-09-14" "42"
Since my goal was to get a second dataset so I could compare my results if I exclude those with bad dates, I actually do want to drop those with NA values as well.
I still want to know what syntax I would use to exclude some numeric vector rather than include it.