-3

I have a set of data (in csv format) that looks something like:

Date Auto_Index Realty_Index
29-Dec-02 1742.2 1000
2-Jan-03 1748.85 1009.67
3-Jan-03 1758.66 1041.45
4-Jan-03 1802.9 1062.11
5-Jan-03 1797.45 1047.56
...
...
...
26-Nov-12 1665.5 248.75
27-Nov-12 1676.3 257.6
29-Nov-12 1696.7 266.9
30-Nov-12 1682.8 266.55
3-Dec-12 1702.6 270.4

I want to analyse this data over different periods in R. Is there a way I can break this data into different periods say 2002-2005, 2006-2009 and 2009-2012?

user1317221_G
  • 15,087
  • 3
  • 52
  • 78
smh
  • 1
  • 2
    Yes. Yes, there is. [What have you tried?](http://whathaveyoutried.com) –  Dec 10 '12 at 13:40
  • question covered by these i think: 1)http://stackoverflow.com/questions/9407622/subsetting-a-dataframe-for-a-specified-month-and-year 2)http://stackoverflow.com/questions/9554507/i-am-unable-to-specify-the-correct-date-format-to-subset-a-dataframe-in-r – user1317221_G Dec 10 '12 at 14:02

2 Answers2

2

As @user1317221_G proposed, you should use function cut.POSIXt. Here's how:

d
        Date Auto_Index Realty_Index
1  29-Dec-02    1742.20      1000.00
2   2-Jan-03    1748.85      1009.67
3   3-Jan-03    1758.66      1041.45
4   4-Jan-03    1802.90      1062.11
5   5-Jan-03    1797.45      1047.56
6  26-Nov-12    1665.50       248.75
7  27-Nov-12    1676.30       257.60
8  29-Nov-12    1696.70       266.90
9  30-Nov-12    1682.80       266.55
10  3-Dec-12    1702.60       270.40

# First step, convert your date column in POSIXct using strptime
d$Date <- strptime(d$Date, format("%d-%b-%y"))

# Then define your break points for your periods:
breaks <- as.POSIXct(c("2002-01-01","2006-01-01","2010-01-01","2013-01-01"))

# Then cut
d$Period <- cut(d$Date, breaks=breaks, 
                        labels=c("2002-2005","2006-2009","2010-2012"))
d
         Date Auto_Index Realty_Index    Period
1  2002-12-29    1742.20      1000.00 2002-2005
2  2003-01-02    1748.85      1009.67 2002-2005
3  2003-01-03    1758.66      1041.45 2002-2005
4  2003-01-04    1802.90      1062.11 2002-2005
5  2003-01-05    1797.45      1047.56 2002-2005
6  2012-11-26    1665.50       248.75 2010-2012
7  2012-11-27    1676.30       257.60 2010-2012
8  2012-11-29    1696.70       266.90 2010-2012
9  2012-11-30    1682.80       266.55 2010-2012
10 2012-12-03    1702.60       270.40 2010-2012
plannapus
  • 18,529
  • 4
  • 72
  • 94
2

If you want to operate on the periods as numbers (rather than text), then this might help:

br <- c("2002","2005","2010","2013")
df$Int <-findInterval(format(as.Date(df$Date,format='%d-%b-%y'),"%Y"),br)
A_K
  • 2,581
  • 2
  • 14
  • 10
  • +1 nice, straightforward alternative. You did a typo, though, it is `format='%d-%b-%y'` and not `format='%d-%b-%Y'`. – plannapus Dec 10 '12 at 14:40