0

I'm trying to automate some analysis for the company where I'm working and I need to check if the data in each row was created in April or not. As a pretty much complete beginner with R, this is no easy task for me. The date is a string presented as follows:

01-Apr-2017 12:34:56

To create a data frame with only observations from April 2017 in them, I am trying to use subset. My code is as follows:

df2=subset(df,identical(substr(Closed,4,11),"Apr-2017"))

When I run this, it gives me a new data frame with 0 rows, which indicates to me that it never found an example in the factor Closed in which characters 4-11 were equal to Apr 2017. However, when I manually use a row that has this date, it returns TRUE just as would be necessary in the subset function:

dats.gmdm$Closed[131] #This returns "14-Apr-2017 14:39:28"
substr(dats.gmdm$Closed[131],4,11) #This returns "Apr-2017"
identical(substr(dats.gmdm$Closed[131],4,11),"Apr-2017") #And this returns TRUE

Theoretically, I should have a set with only these instances, but it just gives me an empty data frame with 22 factors (same as in the original dataset).

Is there another way to do this, or how is my code wrong? If not, why is it not theoretically possible?

Val
  • 6,585
  • 5
  • 22
  • 52
Charlie
  • 1
  • 1
  • You might want to show the first 2-3 lines of data using `dput(df)` which will make it possible for people to build and test an appropriate answer. – sconfluentus Jun 01 '17 at 20:58

1 Answers1

0

You could easily pull out the rows of your data frame that match the string "Apr-2017" (or other date) using grepl() and then use those as row indices to extract from the master df. Assuming that the data are in dats.gmdm and Closed is the variable with the date, then something like this should work:

 date_code <- "Apr-2017"
 idx <- grepl(date_code, dats.gmdm[,"Closed"])
 df2 <- dats.gmdm[idx,]
Mark S
  • 603
  • 4
  • 9
  • Thanks for your comment! When I try that, it gives me an error in every "column" (not sure why it's not referring to them as rows), as shown below: 'idx <- grepl(date_code, dats.gmdm[,dats.gmdm$Closed]) Error: Columns `15-Sep-2015 10:28:56`, `08-Oct-2015 09:54:22`, `18-Apr-2017 13:37:20`' It then proceeds to ignore all this and just return df2 as the original dats.gmdm set. Huh. – Charlie Jun 02 '17 at 13:54
  • @Charlie Can you post a min reproducible ex so we can see the data? See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Mark S Jun 02 '17 at 17:54