I'm relatively new to programming in R, and have run into an issue trying to calculate age between 2 date variables I created. Would prefer to use lubridate, tidyr, tidyverse, dplyr packages, as I'm trying to learn those specific packages, but I'm open to what is best. Data is from the Lahman baseball library. Please feel free to rewrite my awful code.
My code is below:
library(pacman)
p_load("tidyverse", "dplyr", "ggplot2", "lubridate", "stats", "Lahman")
#sort Batting dataset by playerID
Batting.df <- Batting[order(Batting$playerID), ]
#sort Master dataset by playerID
Master.df <- Master[order(Master$playerID), ]
#select variables to keep from Master df
Master.df <- Master.df %>% select(playerID, birthDay, birthMonth, birthYear, nameFirst, nameLast)
#merge Master.df and Batting.df
Batting.df = merge (Batting.df, Master.df, by = "playerID")
#concatenate first and last name
Batting.df <- unite(Batting.df, Name, c(nameFirst, nameLast), sep = ' ', remove = TRUE)
#drop NA values to avoid incorrect calculations of age
Batting.df <- Batting.df %>% tidyr::drop_na(c(birthDay, birthMonth, birthYear))
#add variable of DOB
Batting.df <- Batting.df %>% tidyr::unite(DOB, c(birthMonth, birthDay, birthYear), sep = "-") %>%
dplyr::mutate(DOB = lubridate::parse_date_time(DOB, "mdy"))
#add variable of opening day by season
Batting.df <- Batting.df %>% dplyr::mutate(openingMonth = 4) %>%
dplyr::mutate(openingDay = 1) %>%
tidyr::unite(seasonBegin, c(openingMonth, openingDay, yearID), sep = "-") %>%
dplyr::mutate(seasonBegin = lubridate::parse_date_time(seasonBegin, "mdy"))
My question is how do I create and add a variable of "Age" by finding the number of years between "DOB" & "seasonBegin"? I've tried with(), lubridate::time_length(), but can't get them to work, and the examples I found are for specific dates, not variables.
Any help would be greatly appreciated.