0

Say I have a matrix. There is a column showing the dates information, from first row "20080101" to the last row "20100101". The question is, how can I get rows from date "20080901" to "20091031"? It runs in R.

Example:

2008010106 a
2008010112 b
2008010118 f
2008010206 e
2008010200 w
2008010212 a
2008010218 b
2008010300 f
2008010406 e
2008010306 a
2008010312 b
2008010318 f
2008010400 r
2008010412 e

First column is dates( the last two digits represent hours in a day). Second column is all letters.

Now I want to get rows from " 2008010200" to "2008010412"

NOTICE that the dates are not in a sequential order.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
starfunk
  • 3
  • 5

3 Answers3

2

I like xts subsetting for this kind of thing.

library(xts)
m <- cbind(date=seq(20080101, 20080131, 1),
           matrix(runif(31*2), ncol=2)) 
x <- xts(m[, -1], as.Date(as.character(m[, 1]), '%Y%m%d')) 

x['20080110/20080120']

# 2008-01-10 0.4819532 0.9406910
# 2008-01-11 0.5447225 0.5776338
# 2008-01-12 0.5614482 0.4152551
# 2008-01-13 0.2356413 0.9192496
# 2008-01-14 0.9759123 0.8141157
# 2008-01-15 0.2912074 0.3847100
# 2008-01-16 0.2185788 0.6909651
# 2008-01-17 0.6544894 0.3287306
# 2008-01-18 0.1319076 0.6527686
# 2008-01-19 0.6391880 0.5336123
# 2008-01-20 0.6915097 0.4842339

The above example returns the rows of x that have dates falling between 10 January 2008 and 20 January 2008.

jbaums
  • 27,115
  • 5
  • 79
  • 119
  • Would xts handle `'20080110':'20080120'` ? – IRTFM Oct 30 '14 at 18:20
  • @BondedDust - no, it won't accept that. See [`?subset.xts`](http://www.inside-r.org/packages/cran/xts/docs/.subset.xts) for details. – jbaums Oct 30 '14 at 19:10
  • Ah. Probably because ":" is a common time delimiter. – IRTFM Oct 30 '14 at 19:40
  • Yeah that makes sense. Colons can be used when time is used in a subsetting operation (but as part of the definition of start/end points, not to denote a sequence, e.g. as mentioned in [this post](http://stackoverflow.com/q/11871572)). – jbaums Oct 30 '14 at 22:43
0

A logical expression that would evaluate the n-th column say the 10th can be used in "i" argument to "[".

n=10
shorterM <- M[ M[,n] >= "20080101" & M[,n] <= "20100101" , ]

This should work for either a matrix or a dataframe as long as these "dates" are actually character values with that format. The ">=","<=" abd "&" operators are all vectorized. This is "logical indexing". You do take a risk in posting questions with no code since most respondents think that is your job and may not test (as I have not). Next time post a small example, preferably with the dput function, and specify what the right answer is. Then you get tested code and everyone is happy and you get no close votes ... unless of course this is a duplicate which is certainly possible.

The offered example is used as a worked example:

> DD <- read.table(text="2008010106 a
+ 2008010112 b
+ 2008010118 f
+ 2008010206 e
+ 2008010200 w
+ 2008010212 a
+ 2008010218 b
+ 2008010300 f
+ 2008010406 e
+ 2008010306 a
+ 2008010312 b
+ 2008010318 f
+ 2008010400 r
+ 2008010412 e", colClasses="character")

> (shorterDD <- DD[ DD[,1] >= "2008010200" & DD[,1] <= "2008010412" , ])
           V1 V2
4  2008010206  e
5  2008010200  w
6  2008010212  a
7  2008010218  b
8  2008010300  f
9  2008010406  e
10 2008010306  a
11 2008010312  b
12 2008010318  f
13 2008010400  r
14 2008010412  e
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

You could use between which is a convenient function from dplyr. m from @jbaums' post

  library(dplyr) 
  m[between(m[,"date"], 20080110, 20080120),]
  #       date                      
  #[1,] 20080110 0.19957458 0.22814565
  #[2,] 20080111 0.44428667 0.24073101
  #[3,] 20080112 0.86218249 0.68175459
  #[4,] 20080113 0.31706619 0.48679117
  #[5,] 20080114 0.09629562 0.66931400
  #[6,] 20080115 0.81436380 0.35013160
  #[7,] 20080116 0.34077661 0.54417985
  #[8,] 20080117 0.71414292 0.52569811
  #[9,] 20080118 0.84745961 0.90069540
 #[10,] 20080119 0.04145519 0.05394461
 #[11,] 20080120 0.65274477 0.08029292
akrun
  • 874,273
  • 37
  • 540
  • 662