0

I have a huge vector like this:

12/06/2000     15/07/2001     17/01/2002     25/03/2005     22/05/2005    
 17/01/2006     13/03/2006     05/02/2007     12/02/2008    
4814 Levels: 01/01/2000 01/01/2001 01/01/2002 01/01/2003 01/01/2004 01/01/2005 ... 

Can I subset the vector on different periods using the levels provided to me? So for example if I choose 01/01/2000 until 31/01/2000 R will give me only observations pertaining to the year 2000.

Little caveat to anyone reading this: as.Date(levels(a), "%d/%m/%Y") gives u the date according to the format I provided.

user3083324
  • 585
  • 8
  • 23
  • In this example, do you want only 2000, or only Jan-2000? – jlhoward Jan 08 '14 at 16:32
  • well in this specific case it's kind of redundant since there's only one observation 2000 in this small sample, but say I wanted year 2005 it would give me both 25/03/2005 and 22/05/2005 – user3083324 Jan 08 '14 at 17:00

2 Answers2

0

You will probably need to do something like this (untested code) -

a[strftime(as.Date(as.character(a),'%d/%m/%Y'),'%Y') == '2000']
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • Even if this was correct it would be terribly slow on a large dataset (downvote) – Oleg Sklyar Jan 08 '14 at 16:27
  • So what do u suggest mr.Oleg S.? – user3083324 Jan 08 '14 at 16:31
  • The question did not ask for a suggestion, it asked if levels can be used. Your answer did not answer the question and it is a very slow code. Need a suggestion: you can extract year from Date without converting to string, thus ways faster and more memory efficient than in your piece of code. – Oleg Sklyar Jan 08 '14 at 16:33
  • What is that even supposed to mean, 'if levels can be used'? And yes, one can consider the vector to be a plain string and take the last four characters but the 'proper' way to do it, I feel, is this. – TheComeOnMan Jan 08 '14 at 16:36
  • No, one cannot consider it to be a plain string: the data type that the author shows is `factor`, it is stored as a vector of integers, thus taking much less memory on huge data sizes. Converting the whole vector to character is not really advisable here. – Oleg Sklyar Jan 08 '14 at 18:16
0

In general no, levels are not guaranteed to be sorted. Therefore if you use range of levels you are likely to get data outside of your desired range of original data. If in your particular case you make sure that levels are correctly sorted then yes.

Edit: solution for any sorting:

a = factor(c("2013-12-21","2011-11-28","2000-10-15","2005-08-08","2000-12-31"))

dlevels = as.Date(levels(a))
lindex = which(dlevels >= as.Date("2000-01-01") & dlevels <= as.Date("2000-12-31"))
b = a[a@.Data %in% lindex]

> a
[1] 2013-12-21 2011-11-28 2000-10-15 2005-08-08 2000-12-31
Levels: 2000-10-15 2000-12-31 2005-08-08 2011-11-28 2013-12-21
> b
[1] 2000-10-15 2000-12-31 
Levels: 2000-10-15 2000-12-31 2005-08-08 2011-11-28 2013-12-21
Oleg Sklyar
  • 9,834
  • 6
  • 39
  • 62
  • How am I supposed to sort levels? – user3083324 Jan 08 '14 at 16:27
  • You cannot. As you say they are given to you by whatever function that provided you with the data. You can only check if they are sorted correctly (meaning: according to a sorting order that you want, e.g. in time). Normally this would not be the case unless your data was originally sorted. If they are sorted however, their index is the integer value that is stored in the datavector, so you can use those indexes to slice that integer vector. From here on I would suggest you do your own thinking :) – Oleg Sklyar Jan 08 '14 at 16:31
  • Do u think this work fast enough? http://stackoverflow.com/questions/9749598/r-obtaining-month-and-year-from-a-date – user3083324 Jan 08 '14 at 17:09
  • Does it matter? It is just a different date formatting, which is specific to my locale. The approach is identical – Oleg Sklyar Jan 08 '14 at 18:13