0

I'm relatively new to programming in R, and have run into an issue trying to calculate age between 2 date variables I created. Would prefer to use lubridate, tidyr, tidyverse, dplyr packages, as I'm trying to learn those specific packages, but I'm open to what is best. Data is from the Lahman baseball library. Please feel free to rewrite my awful code.

My code is below:

library(pacman)
p_load("tidyverse", "dplyr", "ggplot2", "lubridate", "stats", "Lahman")

#sort Batting dataset by playerID
Batting.df <- Batting[order(Batting$playerID), ]

#sort Master dataset by playerID
Master.df <- Master[order(Master$playerID), ]

#select variables to keep from Master df
Master.df <- Master.df %>% select(playerID, birthDay, birthMonth, birthYear, nameFirst, nameLast)

#merge Master.df and Batting.df
Batting.df = merge (Batting.df, Master.df, by = "playerID")

#concatenate first and last name
Batting.df <- unite(Batting.df, Name, c(nameFirst, nameLast), sep = ' ', remove = TRUE)

#drop NA values to avoid incorrect calculations of age
Batting.df <- Batting.df %>% tidyr::drop_na(c(birthDay, birthMonth, birthYear)) 

#add variable of DOB
Batting.df <- Batting.df %>% tidyr::unite(DOB, c(birthMonth, birthDay, birthYear), sep = "-") %>%
            dplyr::mutate(DOB = lubridate::parse_date_time(DOB, "mdy"))

#add variable of opening day by season
Batting.df <- Batting.df %>% dplyr::mutate(openingMonth = 4) %>% 
            dplyr::mutate(openingDay = 1) %>%
            tidyr::unite(seasonBegin, c(openingMonth, openingDay, yearID), sep = "-") %>%
            dplyr::mutate(seasonBegin = lubridate::parse_date_time(seasonBegin, "mdy"))

My question is how do I create and add a variable of "Age" by finding the number of years between "DOB" & "seasonBegin"? I've tried with(), lubridate::time_length(), but can't get them to work, and the examples I found are for specific dates, not variables.

Any help would be greatly appreciated.

Rockets
  • 11
  • 2
  • Can you provide a reproducible example of your dataset ? see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – dc37 Feb 25 '20 at 02:00
  • Just a passing comment: If you're "trying to learn tidyverse", then may I suggest replacing `merge` with `inner_join` or one of the other dozen or so `_join` functions. – Edward Feb 25 '20 at 02:06
  • Did you try `difftime()`? – Edward Feb 25 '20 at 02:10
  • Did you check other threads on SO search? Does [this example](https://stackoverflow.com/questions/60178598/thoughts-on-generating-an-age-variable-based-on-years/60180871#60180871) work for you? – rdornas Feb 25 '20 at 02:41
  • @dc37 To reproduce the dataset, you can use `install.packages("Lahman")` – Rockets Feb 25 '20 at 03:53
  • @Edward `merge` is just the first way i learned, I will give the `_join` functions a go, and yes i did try the `difftime()`, but was using year as the unit, so I'll adjust and see if i can get it to work. – Rockets Feb 25 '20 at 03:55
  • Does this answer your question? [Thoughts on Generating an Age Variable Based on Years](https://stackoverflow.com/questions/60178598/thoughts-on-generating-an-age-variable-based-on-years) – rdornas Feb 25 '20 at 04:35

2 Answers2

0

Try this:

Batting.df %>% dplyr::mutate(openingMonth = 4) %>% 
  dplyr::mutate(openingDay = 1) %>%
  tidyr::unite(seasonBegin, c(openingMonth, openingDay, yearID), sep = "-") %>%
  dplyr::mutate(seasonBegin = lubridate::parse_date_time(seasonBegin, "mdy"),
                Age=as.numeric(difftime(seasonBegin, DOB, units="days")/365.25))

There's no unit for "year" in the difftime function, so divide the days by 365.25 and remove the units (as.numeric).

Edward
  • 10,360
  • 2
  • 11
  • 26
0

This worked as well:

Batting.df <- Batting.df %>% dplyr::mutate(openingMonth = 4) %>% 
         dplyr::mutate(openingDay = 1) %>%
         tidyr::unite(seasonBegin, c(openingMonth, openingDay, yearID), sep = "-") %>%
         dplyr::mutate(seasonBegin = lubridate::parse_date_time(seasonBegin, "mdy")) %>% 
         dplyr::mutate(Age = DOB %--% seasonBegin/years(1))
Rockets
  • 11
  • 2
  • Yes, and you could also combine the last two mutate statements into one. This is one selling point for dplyr. ;) – Edward Feb 25 '20 at 04:35
  • @Edward Thanks, for this project it just helps me to break them into steps. I struggle with combining outside of dplyr though. Your help on this is much appreciated! – Rockets Feb 25 '20 at 04:51