-1

I have a dataset that includes a column called BirthYear that includes lots of years in which people were born and I need to create a new column that prints "young" if their BirthYear is > 1993 and to print "old" if their BirthYear is < 1993. I've tried using the if function but I cant seem to achieve it, I would appreciate if u let me know how to do it, thanks!

wibeasley
  • 5,000
  • 3
  • 34
  • 62

2 Answers2

2

I also like cut() for this, especially if you want the result to be a factor.

year    <- sample(1989:1999, size=20, replace=T) # Arbitrary vector of years
breaks  <- c(-Inf, 1993, Inf)                    # The 3 bounds of the 2 intervals
labels  <- c("old", "young")                     # The 2 labels of the 2 intervals

binary  <- cut(x=year, breaks=breaks, labels=labels, right=F)

# Inspect
data.frame(year, binary)

The result:

   year binary
1  1993  young
2  1997  young
3  1989    old
4  1998  young
5  1999  young
6  1989    old
7  1994  young
8  1991    old
9  1991    old
10 1991    old
...

This is close to a duplicate, but involves custom labels.

If you have to inspect more than one variable eventually, look at dplyr::case_when().

wibeasley
  • 5,000
  • 3
  • 34
  • 62
1

Another option could be use dplyr::recode_factor as below:

set.seed(1)
year    <- sample(1970:2005, size=10, replace=T)

> year
#[1] 2001 1975 1979 1994 1974 1973 1985 1994 1975 1981


recode_factor(as.factor(year > 1993), 'TRUE' = "Old", 'FALSE' = "Young")
#[1] Old   Young Young Old   Young Young Young Old   Young Young
#Levels: Old Young
MKR
  • 19,739
  • 4
  • 23
  • 33