-3

I have a large dataset with thousands of dates in the ymd format. I want to convert this column so that way there are three individual columns by year, month, and day. There are literally thousands of dates so I am trying to do this with a single code for the entire dataset.

camille
  • 16,432
  • 18
  • 38
  • 60
Leah H
  • 1
  • 1
    please post the output of `dput(head(my_data))` inside your question. – Cettt Nov 15 '19 at 16:18
  • Are the dates stored as `20170131` or have you already parsed them in to R using `lubridate::ymd()`? – Gavin Simpson Nov 15 '19 at 16:18
  • Does this answer your question? [Extract date elements from POSIXlt and put into data frame in R](https://stackoverflow.com/questions/8126537/extract-date-elements-from-posixlt-and-put-into-data-frame-in-r) – camille Nov 15 '19 at 16:39
  • Also see https://stackoverflow.com/q/9749598/5325862 – camille Nov 15 '19 at 16:42

4 Answers4

2

You can use the year(), month(), and day() extractors in lubridate for this. Here's an example:

library('dplyr')
library('tibble')
library('lubridate')

## create some data
df <- tibble(date = seq(ymd(20190101), ymd(20191231), by = '7 days'))

which yields

> df
# A tibble: 53 x 1
   date      
   <date>    
 1 2019-01-01
 2 2019-01-08
 3 2019-01-15
 4 2019-01-22
 5 2019-01-29
 6 2019-02-05
 7 2019-02-12
 8 2019-02-19
 9 2019-02-26
10 2019-03-05
# … with 43 more rows

Then mutate df using the relevant extractor function:

df <- mutate(df,
             year  = year(date),
             month = month(date),
             day   = day(date))

This results in:

> df
# A tibble: 53 x 4
   date        year month   day
   <date>     <dbl> <dbl> <int>
 1 2019-01-01  2019     1     1
 2 2019-01-08  2019     1     8
 3 2019-01-15  2019     1    15
 4 2019-01-22  2019     1    22
 5 2019-01-29  2019     1    29
 6 2019-02-05  2019     2     5
 7 2019-02-12  2019     2    12
 8 2019-02-19  2019     2    19
 9 2019-02-26  2019     2    26
10 2019-03-05  2019     3     5
# … with 43 more rows

If you only want the new three columns, use transmute() instead of mutate().

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
1

Using lubridate but without having to specify a separator:

library(tidyverse)
df <- tibble(d = c('2019/3/18','2018/10/29'))
df %>%
  mutate(
    date = lubridate::ymd(d),
    year = lubridate::year(date),
    month = lubridate::month(date),
    day = lubridate::day(date)
  )

Note that you can change the first entry from ymd to fit other formats.

AHart
  • 448
  • 3
  • 10
1

A slighlty different tidyverse solution that requires less code could be:

Code

tibble(date = "2018-05-01") %>% 
  mutate_at(vars(date), lst(year, month, day))

Result

# A tibble: 1 x 4
  date        year month   day
  <chr>      <dbl> <dbl> <int>
1 2018-05-01  2018     5     1
davsjob
  • 1,882
  • 15
  • 10
0
#Data
d = data.frame(date = c("2019-01-01", "2019-02-01", "2012/03/04"))

library(lubridate)    

cbind(d,
      read.table(header = FALSE,
                 sep = "-",
                 text = as.character(ymd(d$date))))
#        date   V1 V2 V3
#1 2019-01-01 2019  1  1
#2 2019-02-01 2019  2  1
#3 2012/03/04 2012  3  4

OR

library(dplyr)
library(tidyr)
library(lubridate)
d %>%
    mutate(date2 = as.character(ymd(date))) %>%
    separate(date2, c("year", "month", "day"), "-")
#        date year month day
#1 2019-01-01 2019    01  01
#2 2019-02-01 2019    02  01
#3 2012/03/04 2012    03  04
d.b
  • 32,245
  • 6
  • 36
  • 77