11

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?

 update <tablename> set A1 = substr(ORIG, 1, 4), 
                       A2 = substr(ORIG, 5, 6), 
                       A3 = substr(ORIG, 11, 6), 
                       A4 = substr(ORIG, 17, 5); 

I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)

Joey
  • 181
  • 2
  • 4
  • 10

5 Answers5

26

I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:

datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
                 year = as.numeric(format(datetxt, format = "%Y")),
                 month = as.numeric(format(datetxt, format = "%m")),
                 day = as.numeric(format(datetxt, format = "%d")))

Which gives:

> df
        date year month day
1 2010-01-02 2010     1   2
2 2010-02-03 2010     2   3
3 2010-09-10 2010     9  10

Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
9

Given a text variable x, like this:

> x
[1] "10/3/2001"

then:

> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"

converts it to a date object. Then, if you need it:

> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"

gives you a Julian date (relative to 1970-01-01).

Don't try the substring thing...

See help(as.Date) for more.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
6

Quick ones:

  1. Julian date converters already exist in base R, see eg help(julian).

  2. One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.

  3. Parsing dates as string is almost always a bad approach.

Here is an example:

datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates)  ## now as POSIXlt types
plt[["year"]] + 1900      ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1          ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]] 
#[1]  2  3 10
df <- data.frame(year=plt[["year"]] + 1900, 
                  month=plt[["mon"]] + 1, day=plt[["mday"]])
df
#  year month day
#1 2010     1   2
#2 2010     2   3
#3 2010     9  10

And of course

julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
user2100721
  • 3,557
  • 2
  • 20
  • 29
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
6

To convert date (m/d/y format) into 3 separate columns,consider the df,

df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
      date
1 01-02-18
2 02-20-18
3 03-23-18

Convert to date format

df$date <- as.Date(df$date, format="%m-%d-%y")
df
        date
1 2018-01-02
2 2018-02-20
3 2018-03-23

To get three seperate columns with year, month and date,

library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date)) 
df$day <- day(ymd(df$date))
df
        date year month day
1 2018-01-02 2018     1   2
2 2018-02-20 2018     2  20
3 2018-03-23 2018     3  23

Hope this helps.

AMS
  • 151
  • 1
  • 9
1

Hi Gavin: another way [using your idea] is:

The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks. The variables are:

colnames(stocks)
"bpV"    "bpO"    "bpC"    "bpMN"   "bpMX"   "emdate" "emV"    "emO"    "emC"  
"emMN"   "emMN.1" "chdate" "chV"    "cbO"    "chC"    "chMN"   "chMX" 

One of the first things to do is change the emdate field, which is an integer vector, into a date vector.

realdate<-as.Date(emdate,format="%m/%d/%Y")

Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.

> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))

ls() will include the individual vectors, day, month, year and dfdate. Now merge the dfdate, day, month, year into the original data-frame [stocks].

ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)

"date"   "day"    "month"  "year"   "bpV"    "bpO"    "bpC"    "bpMN"   "bpMX"   "emdate" "emV"    "emO"    "emC"    "emMN"   "emMX" "chdate" "chV"   
"cbO"    "chC"    "chMN"   "chMX"

Similar results and I also have date, day, month, year as separate vectors outside of the df.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197