0

I have this data with 4000 observations, so this is head(both):

     kön             gdk   age fbkurs     pers   stterm
1    man          FALSE    69  FALSE 1941-12-23 2011-01-19
2    man             NA    70  FALSE 1942-02-11 2012-01-19
3 kvinna             NA    65  FALSE 1942-06-04 2007-09-01
4 kvinna           TRUE    68  FALSE 1943-04-04 2011-09-01
5 kvinna             NA    65  FALSE 1943-10-30 2008-09-01
6    man          FALSE    70   TRUE 1944-01-27 2013-09-01

I I want to create a new column based on the column named 'stterm'. In stterm I have different dates that I would rather name for example. VT10, VT11, etc. I like to call the new column regyear.

I have tried to enter:

regyear <- factor(both$stterm, levels = c("2007-09-01"="HT07" "2008-09-01"="HT09" "2009-01-19"="VT09" "2009-09-01"="HT09" "2010-01-19"="VT10" "2010-09-01"="HT10" "2011-01-19"="VT11"
                                       "2011-09-01"="HT11" "2012-01-19"="VT12" "2012-09-01"="HT12" "2013-01-19"="VT13" "2013-09-01"="HT13" "2014-01-19"="VT14"))

but when I do, I get the following error message:

Error: unexpected string constant in "regyear<- factor(both$stterm, levels = c("2007-09-01"='HT07' "2008-09-01""

What should I do to make them right?

Henrik
  • 65,555
  • 14
  • 143
  • 159
malin
  • 37
  • 1
  • 4
  • You need to give `factors` two vectors, one called levels `levels=c("2007-09-01", "2008-09-01", ...)` and another one called labels `labels=c("HT07", "HT09", ...)`. – Ernest A Apr 12 '15 at 09:55
  • First of all, you have to separate your labels with commas (e.g. `c("2007-09-01"="HT07", "2008-09-01"="HT09" ...` ). But anyway, I noticed that two different dates are mapped to the same string (HT09)... so, is your goal to simply map one or more dates to another string ? If so, you don't strictly need to use factors... – digEmAll Apr 12 '15 at 09:58

2 Answers2

4

Your code relies on quite a bit of hard-coding, which may be prone to mistakes and will be tedious if you have many dates which you wish to map to periods.

Here are some alternatives, where your dates first are converted to class Date using as.Date. This makes it easier to extract and map months to the periods "VT" or "HT", and to extract the year.

In the first example, I use cut which "divides the range of x into intervals and codes the values in x according to which interval they fall.":

# some dates which are converted to proper R dates
dates <- as.Date(c("2006-09-01", "2007-02-01", "2008-09-01", "2009-01-19"))

# extract month
month <- as.integer(format(dates, "%m"))

# extract year
year <- format(dates, "%y")

# cut the months into intervals and label the levels
term <- cut(x = month, breaks = c(0, 8, 12), labels = c("VT", "HT"))

# paste 'term' and 'year' together
paste0(term, year)
# [1] "HT06" "VT07" "HT08" "VT09" 

In the second example, findInterval is used to create a numerical vector of interval indices. This vector is used to extract elements from a 'period' vector. The periods are then pasted with year as above.

paste0(c("VT", "HT")[findInterval(x = month, vec = c(1, 9))], year)
# [1] "HT06" "VT07" "HT08" "VT09"

Finally, a similar, more 'manual' method, which is less convenient if you have many 'breaks' and intervals to which you wish to map your dates:

paste0(c("VT", "HT")[as.integer(month > 8) + 1], year)
# [1] "HT06" "VT07" "HT08" "VT09"

Another relevant Q&A here.

Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
2

You could do it like this:

both$regyear<- factor(both$stterm, labels = c("2007-09-01"="HT07","2008-09-01"="HT09",
                                              "2011-01-19"="VT11","2011-09-01"="HT11",
                                              "2012-01-19"="VT12","2013-09-01"="HT13"))

There are several problems in your original code:

  1. It did not create a new variable in your dataframe: regyear<- factor(both$stterm, ... should be both$regyear<- factor(both$stterm, ...
  2. You had no comma's between the levels/labels.
  3. You had to many levels for the given example dataset (see these instructions on how to give a reproducable example).
Community
  • 1
  • 1
Jaap
  • 81,064
  • 34
  • 182
  • 193