I have a dataset that includes a column called BirthYear that includes lots of years in which people were born and I need to create a new column that prints "young" if their BirthYear is > 1993 and to print "old" if their BirthYear is < 1993. I've tried using the if function but I cant seem to achieve it, I would appreciate if u let me know how to do it, thanks!
Asked
Active
Viewed 711 times
-1
-
4look at `?ifelse` – dww Mar 14 '18 at 21:26
-
1df$BirthYear = ifelse(df$BirthYear < 1993, 'old','young') – YOLO Mar 14 '18 at 21:29
2 Answers
2
I also like cut()
for this, especially if you want the result to be a factor.
year <- sample(1989:1999, size=20, replace=T) # Arbitrary vector of years
breaks <- c(-Inf, 1993, Inf) # The 3 bounds of the 2 intervals
labels <- c("old", "young") # The 2 labels of the 2 intervals
binary <- cut(x=year, breaks=breaks, labels=labels, right=F)
# Inspect
data.frame(year, binary)
The result:
year binary
1 1993 young
2 1997 young
3 1989 old
4 1998 young
5 1999 young
6 1989 old
7 1994 young
8 1991 old
9 1991 old
10 1991 old
...
This is close to a duplicate, but involves custom labels.
If you have to inspect more than one variable eventually, look at dplyr::case_when()
.

wibeasley
- 5,000
- 3
- 34
- 62
1
Another option could be use dplyr::recode_factor
as below:
set.seed(1)
year <- sample(1970:2005, size=10, replace=T)
> year
#[1] 2001 1975 1979 1994 1974 1973 1985 1994 1975 1981
recode_factor(as.factor(year > 1993), 'TRUE' = "Old", 'FALSE' = "Young")
#[1] Old Young Young Old Young Young Young Old Young Young
#Levels: Old Young

MKR
- 19,739
- 4
- 23
- 33