0

I have a dataset (Crime) with 6,847,944 observations. I am trying to downsize this data to only those occurring in the relevant year of 2016. The dates can be found in the "Date" column. I have tried all of the following for code:

#change dates to proper format#
Crime$Date = as.Date(Crime$Date, format = "%m/%d/%y")

#filter crimes to 2016#

ATTEMPT 1: Crime16 = subset(Crime$Date = as.Date("2016"))

RESULT 1: Error: unexpected '=' in "Crime16 = subset(Crime$Date ="

ATTEMPT 2: Crimes_2016 <- Crime[year(Date)==2016,]

RESULT 2: Error in as.POSIXlt.default(x, tz = tz(x)) : do not know how to convert 'x' to class “POSIXlt”

ATTEMPT 3: Crimes_2016 = subset(Crime, Date >=2016/1/1 & Date <= 2016/31/12)

RESULT 3: Creates data frame, but contains no observations.

ATTEMPT 4: morecrimes = subset(Crime, Date == 2016)

RESULT 4: Creates data frame, but contains no observations.

ATTEMPT 5: Crimes.2016 = selectByDate(Crime$Date = 2016)

RESULT 5: Error: unexpected '=' in "Crimes.2016 = selectByDate(Crime$Date ="

MarianD
  • 13,096
  • 12
  • 42
  • 54

1 Answers1

0

Without a proper reproducible example dataset I cannot be sure of what you are after but... taking the following dataframe as a test:

x <- data.frame(
  "Date" = as.Date(c("2016-01-01", "2015-05-12", "2016-06-16"), format = "%Y-%m-%d"),
  "Crime" = LETTERS[1:3])

Which gives:

> x
        Date Crime
1 2016-01-01     A
2 2015-05-12     B
3 2016-06-16     C

This can be subset making a logical vector, generated by format(x$Date, "%Y") == "2016" where I change the date format to just year, and using that in a linear search of the data.frame to return the rows where the elements of the logical vector are "TRUE" as such:

> x[format(x$Date, "%Y") == "2016", ]
        Date Crime
1 2016-01-01     A
3 2016-06-16     C

x[format(x$Date, "%Y") == "2016", ]

Giving:

> x[format(x$Date, "%Y") == "2016", ]
        Date Crime
1 2016-01-01     A
3 2016-06-16     C

Alternatively you could use the dplyr function filter():

library(tidyverse)
# Route 1. Implement filter() the base R way
filter(x, format(x$Date, "%Y") == "2016")
# Route 2. Use filter() the tidyverse way
x %>% filter(format(x$Date, "%Y") == "2016")
rg255
  • 4,119
  • 3
  • 22
  • 40