0

This question is related to my previous one, Subsetting a dataframe for a specified month and year

I use the command

sales <- read.csv("mysales.csv", colClasses="character")

to obtain a dataframe which looks like this:

    row     date            pieces       income
    1       21/11/2011      49           220.5
    2       22/11/2011      58           261
    3       23/11/2011      23           103.5
    4       24/11/2011      57           256.5

I want to create a subset for November 2011 using the code provided in my previous question, but various attempts have failed. So for a check I wrote in the console:

format.Date(sales[1,1], "%Y")=="2011"

and the answer was:

[1] FALSE

Moreover:

format(as.Date(sales[1,1]), "%d/%m/%Y")
[1] "20/11/21"

How can I, at least, know what is happening with date format?

What should I do to subset the dataframe using code like:

subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")

Sorry if my question is not clear, but the problem I am facing is not clear to me either.

(Edit to correct formatting)

Community
  • 1
  • 1
John James Pork
  • 181
  • 1
  • 2
  • 7
  • This isn't much different from your other question. Your dates are not really dates, they are just strings. If you read @TylerRinker's answer carefully, you'll see that the first thing he did is replace your date column with `Date` objects using `as.Date`. – flodel Mar 04 '12 at 11:58

2 Answers2

1

Currently, what you think are dates really are just strings of characters. You need to turn them into Date objects using as.Date, and for that specify the format they are in (%d/%m/%Y) or R won't guess it for you.

sales <- data.frame(date   = c("21/11/2011", "21/11/2011", "23/11/2012", "24/11/2012"),
                    pieces = c(49,58,23,57,34),
                    income = c(220.5, 261, 103.5, 256.5, 112))
class(sales$date)
# [1] "factor"
sales$date <- as.Date(sales$date, "%d/%m/%Y")
class(sales$date)
# [1] "Date"
subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")
#         date pieces income
# 1 2011-11-21     49  220.5
# 2 2011-11-21     58  261.0
flodel
  • 87,577
  • 21
  • 185
  • 223
  • This method works. More than just supplying lines of code to copy & paste, your answer made me understand about how dates are treated. – John James Pork Mar 06 '12 at 06:11
0

Just to make the answer more general, I have added another month too.

The working data looks like:

           date pieces income
1 21/11/2011     49  220.5
2 22/11/2011     58  261.0
3 23/11/2011     23  103.5
4 24/11/2011     57  256.5
5 23/12/2011     50  240.0

There are many ways to do this. One I use regularly is strsplit and lapply.

sale$date1<-as.Date(sale$date, "%d/%m/%Y")  # let R know the date format

# Create a column of months by splitting the dates into 3 parts and grabbing the middle 
# part which is months
sale$months<-lapply(strsplit(as.character(sale$date1), "-"), function(x){x[2]}) 

# finally keep only the data for the month of November 

required<-subset(sale[which(sale$months==11),], select=-c(months,date1))

        date pieces income
1 21/11/2011     49  220.5
2 22/11/2011     58  261.0
3 23/11/2011     23  103.5
4 24/11/2011     57  256.5
user1234357
  • 319
  • 1
  • 3