2

How can I convert a column of integers as dates:

       DATE PRCP
1: 19490101   25
2: 19490102    5
3: 19490118   18
4: 19490119  386
5: 19490202   38

to a table like this:

days   month   years   PRCP
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 2
    There are many ways of achieving this. For starters, you could have a look at `?substr` or `?strsplit`. Also, `?weekdays` might be interesting. – coffeinjunky Mar 23 '16 at 10:41
  • 1
    Also, see [this](http://stackoverflow.com/questions/22776381/splitting-numeric-yyyymmdd-column-r) – alexis_laz Mar 23 '16 at 12:10

5 Answers5

9

We can use extract

library(tidyr)
extract(df, DATE, into=c('YEAR', 'MONTH', 'DAY'), 
         '(.{4})(.{2})(.{2})', remove=FALSE)
#       DATE YEAR MONTH DAY PRCP
#1 19490101 1949    01  01   25
#2 19490102 1949    01  02    5
#3 19490118 1949    01  18   18
#4 19490119 1949    01  19  386
#5 19490202 1949    02  02   38
akrun
  • 874,273
  • 37
  • 540
  • 662
5

Here's another way using regular expressions:

df <- read.table(header=T, stringsAsFactors=F, text="
DATE PRCP
19490101   25
19490102    5
19490118   18
19490119  386
19490202   38")
dates <- as.character(df$DATE)
res <- t(sapply(regmatches(dates, regexec("(\\d{4})(\\d{2})(\\d{2})", dates)), "[", -1))
res <- structure(as.integer(res), .Dim=dim(res)) # make them integer values
cbind(df, setNames(as.data.frame(res), c("Y", "M", "D"))) # combine with original data frame
#       DATE PRCP    Y  M  D
# 1 19490101   25 1949 01 01
# 2 19490102    5 1949 01 02
# 3 19490118   18 1949 01 18
# 4 19490119  386 1949 01 19
# 5 19490202   38 1949 02 02
lukeA
  • 53,097
  • 5
  • 97
  • 100
5

I would advise you to use the lubridate package:

require(lubridate)
df[, DATE := ymd(DATE)]
df[, c("Day", "Month", "Year") := list(day(DATE), month(DATE), year(DATE))]
df[, DATE := NULL]
radiumhead
  • 502
  • 2
  • 9
5

Another option would be to use separate from the tidyr package:

library(tidyr)
separate(df, DATE, c('year','month','day'), sep = c(4,6), remove = FALSE)

which results in:

       DATE year month day PRCP
1: 19490101 1949    01  01   25
2: 19490102 1949    01  02    5
3: 19490118 1949    01  18   18
4: 19490119 1949    01  19  386
5: 19490202 1949    02  02   38

Two options in base R:

1) with substr as said by @coffeinjunky in the comments:

df$year <- substr(df$DATE,1,4)
df$month <- substr(df$DATE,5,6)
df$day <- substr(df$DATE,7,8)

2) with as.Date and format:

df$DATE <- as.Date(as.character(df$DATE),'%Y%m%d')
df$year <- format(df$DATE, '%Y')
df$month <- format(df$DATE, '%m')
df$day <- format(df$DATE, '%d')
Jaap
  • 81,064
  • 34
  • 182
  • 193
2

First I would convert the DATE column to Date type using as.Date(), then build the new data.frame using calls to format():

df <- data.frame(DATE=c(19490101,19490102,19490118,19490119,19490202),PRCP=c(25,5,18,386,38),stringsAsFactors=F);
df$DATE <- as.Date(as.character(df$DATE),'%Y%m%d');
data.frame(day=as.integer(format(df$DATE,'%d')),month=as.integer(format(df$DATE,'%m')),year=as.integer(format(df$DATE,'%Y')),PRCP=df$PRCP);
##   day month year PRCP
## 1   1     1 1949   25
## 2   2     1 1949    5
## 3  18     1 1949   18
## 4  19     1 1949  386
## 5   2     2 1949   38
bgoldst
  • 34,190
  • 6
  • 38
  • 64