3

I have a data.table in R and want to create a new column. Let's say that I have the date column name saved as a variable and want to append _year to that name in the new column. I'm able to do that the normal route by just specifying the name, but how can I create the new column name using the date_col variable.

Here is what I've tried. The last two, which I want, don't work.

dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"
dat[,`:=`(OrderDate_year = substr(get(date_col)[!is.na(get(date_col))],1,4))][]
dat[,`:=`(new = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
smci
  • 32,567
  • 20
  • 113
  • 146
ATMA
  • 1,450
  • 4
  • 23
  • 33

2 Answers2

3

The last two statements return an error message:

dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") ="
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) ="

The correct syntax for calling the :=() function is:

dat[, `:=`(paste0(date_col, "_year", sep = ""), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]

i.e., replace = by ,.


However, assignment syntax and right hand side are far too complicated.

The order_date column is already of class Date:

str(dat)
Classes ‘data.table’ and 'data.frame':    5 obs. of  3 variables:
 $ one       : int  1 2 3 4 5
 $ two       : int  1 2 3 4 5
 $ order_date: Date, format: "2015-01-01" "2015-02-01" ...
 - attr(*, ".internal.selfref")=<externalptr>

In order to extract the year, year() function can be used (either from the data.table package or the lubridate package whatever is loaded last), so no conversion back to character and extraction of the year string is required:

date_col = "order_date"
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][]
   one two order_date order_date_year
1:   1   1 2015-01-01            2015
2:   2   2 2015-02-01            2015
3:   3   3 2015-03-01            2015
4:   4   4 2015-04-01            2015
5:   5   5 2015-05-01            2015

Alternatively,

dat[, paste0(date_col, "_year") := year(get(date_col))][]
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][]

work as well.

Uwe
  • 41,420
  • 11
  • 90
  • 134
1

The set function is nice for doing this. Quicker than setting inside the data.table too. Is this what you are after? http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#fast-looping-with-set

library(data.table)
dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"

year_col <- paste0(date_col, "_year", sep="")
set(dat, j = year_col, value = substr(dat[[date_col]], 1, 4) )
Jonny Phelps
  • 2,687
  • 1
  • 11
  • 20