2

I have a dataframe df with the column ReleaseDate, a Factor column with data like this:

Apr 10, 2001
Apr 10, 2007
...

I want to make a new column ReleaseYear with only the year, which is always the last four characters in the ReleaseDate data.

How do I get the last four characters from ReleaseDate for ReleaseYear?

Username
  • 3,463
  • 11
  • 68
  • 111

3 Answers3

5

Here are two options, one use the year from lubridate package, another use regular expression:

library(lubridate)
year(as.Date("Apr 10, 2001", format = "%b %d, %Y"))
[1] 2001

library(stringr)
str_extract("Apr 10, 2001", "\\d{4}$")
[1] "2001"
Psidom
  • 209,562
  • 33
  • 339
  • 356
3

This is one option. gsub will return everything after ", ".

a <- c("Apr 10, 2001", "Apr 10, 2007")
df <- data.frame(a)
colnames(df) <- "ReleaseDate"
df$ReleaseYear <- gsub("^.*?, ","",a)

This is an alternative.

df$ReleaseYear <- substr(df$ReleaseDate, 9, 12)

One more option.

library(stringr)
df$ReleaseYear <- str_sub(df$ReleaseDate, -4)
milan
  • 4,782
  • 2
  • 21
  • 39
2

use substr. substr(x, start, stop). your start will be the length of df -4.

substr(df, nchar(df)-4,4)
chungtinhlakho
  • 870
  • 10
  • 21