1

I have a dataset with 10 columns, one of which is date in the following format

10-MAR-12 00.00.00.000000000

I would like to convert this into a data format which is read as a date and not as a string in the following format

10/03/12

I would also like there to be an additional column that says what day of the week it is

I would then like to filter out certain days or dates and to create a subset of my data.

I am a beginner to R so any help is appreciated

user1407670
  • 21
  • 1
  • 3

1 Answers1

2

Take a look at ?strptime for formatting options and as.Date or as.POSIXct for the function to convert. Also, don't be surprised if your question is down voted or closed since this is a common question and answers can be found on SO or from quick google searching.

Specifically:

format(as.Date(tolower('10-MAR-12 00.00.00.000000000'), format='%d-%b-%y'), format='%d/%m/%y')

should give you the formatting you're looking for. If you want a date type though you should take off the outer format.

Justin
  • 42,475
  • 9
  • 93
  • 111
  • Thanks I was trying to use strptime, however I am more stuck on adding this back into my dataset and adding in extra columns. Bear in mind I am a COMPLETE beginner to R (started to use it yesterday) – user1407670 May 22 '12 at 14:14
  • There are many [excellent resources](http://cran.r-project.org/doc/manuals/R-intro.pdf) out there. I would start there and work through some examples. Assuming your data is a `data.frame` called `dat` (check with `str`) you can add columns like: `dat$newcol <- 'foo'`. But google for intro to R and read a bunch. You'll save yourself hours of headaches! – Justin May 22 '12 at 14:22
  • I tried the folloing and it printed a load of dates in the same format as before with lots of 0's still! :( format(depdateold,as.Date(tolower('10-MAR-12 00.00.00.000000000'),format='%d-%b-%y'), format='%d/%m/%y') – user1407670 May 22 '12 at 14:29
  • your date vector is called depdateold? put it in the place of the long date I wrote out. `depdatenew <- as.Date(tolower(depdateold), format='%d-%b-%y')`. Then check `str(depdatenew)` to see the format of your new vector. – Justin May 22 '12 at 14:34
  • Filter? you'll have to be more specific. And you should probably ask another separate question. But I'd suggest reading and maybe starting with a simpler problem to solve. – Justin May 22 '12 at 14:35
  • Cool that worked - I get format: "2012-03-10" "2012-03-10" etc. Now to filer? I have 10 dates that I wish to exclude from analysis, 5 are consecutive and the other 5 are consecutive but in a different month... – user1407670 May 22 '12 at 14:36
  • So like i've said these are all questions that you would find the answers to online and you would probably learn a lot more about how R works by reading and experimenting on your own. `depdatenew[!depdatenew %in% dates.to.remove]` will do what you're asking where dates.to.remove are the vector of 10 dates you want to exclude. – Justin May 22 '12 at 14:46
  • Thanks I tried: newdata=subset(olddata,newdate<"2012-12-13" | newdate>2012-12-24, select=c(newdate,sc,ow)) and get the following error: Error in as.POSIXct.numeric(X[[2L]], ...) : 'origin' must be supplied Is there something wrong with my date format? – user1407670 May 22 '12 at 14:51
  • something like that, but this is really a new question. Please ask it on the main board so people can see it. If you think my answer for this question was correct you can accept it by clicking the check mark. Also, take a read of [this post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) regarding good question writing. – Justin May 22 '12 at 14:57
  • cheers justin - it was all in the original question but i will try again – user1407670 May 22 '12 at 14:59