121

How can I remove the first elements from a variable, especially if this variable has a special characters. For example, I have the following column:

Date
01/01/2009
01/01/2010
01/01/2011
01/01/2012

I need to have a new column like the following:

Date
2009
2010
2011
2012
Braiam
  • 1
  • 11
  • 47
  • 78
hbtf.1046
  • 1,377
  • 2
  • 9
  • 8
  • 14
    Convert to 'Date' class and use `format` to extract the 'year' – akrun Apr 12 '16 at 08:49
  • 5
    or `gsub(".*/","",df$Date)` – mtoto Apr 12 '16 at 08:51
  • 2
    or `substr(as.character(....), 7, 10)` – jogo Apr 12 '16 at 08:51
  • 7
    `lubridate::year` should also do the trick once the data is in 'Date' format as suggested by @akrun. – fdetsch Apr 12 '16 at 08:54
  • 6
    The cleanest solution is to coerce that variable to `Date` and use either `format` or other functions to extract parts of it. For example, `x <- as.Date("01/01/2009", format = "%m/%d/%Y"); lubridate::year(x)`. – Roman Luštrik Apr 12 '16 at 08:57
  • Possible duplicate of [Extract month and year from a zoo::yearmon object](http://stackoverflow.com/questions/9749598/extract-month-and-year-from-a-zooyearmon-object) – zx8754 Apr 12 '16 at 09:36
  • `data.table::year()` is now also available. Look [here](https://stackoverflow.com/a/63850606/4742889). – andschar Jan 14 '21 at 11:20

7 Answers7

231

As discussed in the comments, this can be achieved by converting the entry into Date format and extracting the year, for instance like this:

format(as.Date(df1$Date, format="%d/%m/%Y"),"%Y")
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • 7
    Why the hell does this work? If I look at `format()`'s documentation, there is nothing said about the second argument that you provided. How should I understand this? – scarface Jan 21 '18 at 16:09
  • 18
    From `?format`: "format is a **generic function**. Apart from the methods described here there are methods for dates (**see format.Date**)". From `?format.Date`: "## S3 method for class 'Date' format(x, ...) [where ... denotes] further arguments to be passed from or to other methods, **including format for as.character and as.Date** methods.". See also the first example in `?format.Date`. – RHertel Jan 21 '18 at 18:46
84
library(lubridate)
a=mdy(b)
year(a)

https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html http://vita.had.co.nz/papers/lubridate.pdf

Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60
41

When you convert your variable to Date:

date <-  as.Date('10/30/2018','%m/%d/%Y')

you can then cut out the elements you want and make new variables, like year:

year <- as.numeric(format(date,'%Y'))

or month:

month <- as.numeric(format(date,'%m'))
invictus
  • 1,821
  • 3
  • 25
  • 30
27

if all your dates are the same width, you can put the dates in a vector and use substring

Date
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
substring(a,7,10) #This takes string and only keeps the characters beginning in position 7 to position 10

output

[1] "2009" "2010" "2011"
Alexander
  • 908
  • 1
  • 11
  • 14
  • 1
    I agree, but you can easily transform this into a numeric vector, no? `as.numeric(substring(a,7,10))` – Dr. Fabian Habersack Oct 05 '18 at 14:44
  • 8
    Dates should not be converted to strings or numbers; they are inherently a 'number of x's (seconds) since a fixed time point' and displayed as human-readable strings - strictly not to be manipulated as strings. – skoh Jan 14 '19 at 21:14
4

This is more advice than a specific answer, but my suggestion is to convert dates to date variables immediately, rather than keeping them as strings. This way you can use date (and time) functions on them, rather than trying to use very troublesome workarounds.

As pointed out, the lubridate package has nice extraction functions.

For some projects, I have found that piecing dates out from the start is helpful: create year, month, day (of month) and day (of week) variables to start with. This can simplify summaries, tables and graphs, because the extraction code is separate from the summary/table/graph code, and because if you need to change it, you don't have to roll out those changes in multiple spots.

Barry DeCicco
  • 251
  • 1
  • 7
4

If you are using the date package, this can be done fairly easily.

library(date)
Date <- c("01/01/2009", "01/01/2010", "01/01/2011", "01/01/2012")
Date <- as.date(Date)
Date
# [1] 1Jan2009 1Jan2010 1Jan2011 1Jan2012
date.mdy(Date)$year
# [1] 2009 2010 2011 2012

## be aware that these are now integers and thus different methods may be invoked:
str(date.mdy(Date)$year)
# int [1:4] 2009 2010 2011 2012
summary(Date)
#     First      Last   
# "1Jan2009" "1Jan2012" 
summary(date.mdy(Date)$year)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#    2009    2010    2010    2010    2011    2012 
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
3

For some time now, you can also only rely on the data.table package and its IDate class plus associated functions (Check ?as.IDate()).

require(data.table)

a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
year(as.IDate(a, '%d/%m/%Y')) # all data.table functions
andschar
  • 3,504
  • 2
  • 27
  • 35