0

I'm an absolute R beginner here working on a Master's project.

I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). The data is organised in a way that every row contains information for a specific year the horse competed and including a first row for each horse of "Total", so there's a summary for every variable for it's total competing life. It looks like this:

Data example I created a new variable with their age using the age_calc function in the eeptools package:

travdata$Age<-age_calc(as.Date(travdata$Birth.date), enddate=as.Date("2016-12-31"),
                       units="years")

With no problems. What I'm trying to figure out is if there is any way I can calculate the age of the horses for each specific year I have info on them-that is, the "Total" row would have their age up until 2016-12-31, for the year 2015 it would have their age at that time and so on. I've been trying to include if statements in age_calc but it won't work and I'm really at a loss on how best to do this.

Any literature or help you could point me to would be much, much appreciated.

MWE

travdata <- data.frame(
    "Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
    "Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
    "Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
    "Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
    "Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
    "Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
    "starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
    "X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
    "X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
    "X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
    "Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
rgunning
  • 568
  • 2
  • 16
  • 3
    Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lmo May 23 '17 at 15:25
  • 3
    I would keep this data tidy and drop the total row, you can easily get the totals back later using `dplyr::group_by` or you can just move these rows to a new table. If you need help with this please use `dput(head(travdata, 10))` and paste the results in to your question. This will create a minimum example for someone to start with. – Ian Wesley May 23 '17 at 15:38

1 Answers1

0

The trick is to filter out the "Total" rows and specify a format for the as.Date() function

library(eeptools)
travdata <- data.frame(
    "Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
    "Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
    "Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
    "Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
    "Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
    "Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
    "starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
    "X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
    "X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
    "X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
    "Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)

travdata$Age<-age_calc(as.Date(travdata$Birth.date), 
                       enddate=as.Date("2016-12-31"), units="years")

competitions <- travdata[travdata$Competition.year!="Total",]
competitions$Competition.age<-age_calc(
                 as.Date(competitions$Birth.date),
                 enddate=as.Date(competitions$Competition.year, format="%Y"), 
                 units="years",F)
rgunning
  • 568
  • 2
  • 16
  • 1
    Yes this did it! Thank you thank you!! That format tweak to the code was what was missing for me I think. I'll keep it in mind for the future :) – Laura Bas May 23 '17 at 16:40
  • Hey @rgunning! Thanks again for your help last time. I'm going over to this code because I want to convert another age variable I did, `Real.age`, into months and now not only this old code of yours isn't working, nothing is! (I saved the script of how I produced my current dataset). This is what I did for the `Real.age` variable: `Data_year$Real.age<-age_calc(as.Date(Data_year$Birth.date), enddate=as.Date(Data_year$Competition.year, format="%Y"), units="years", F)` . I tried changing the last eyars to months and all it says is that an origin must be supplied. Even for your code! Help? – Laura Bas Jul 18 '17 at 19:55
  • @LauraBas odd that it won't work for you. swapping `units="years"` to `units="months"` should work. Take a look at `as.Date(Data_year$Birth.date)` and `as.Date(Data_year$Competition.year, format="%Y")` to make sure that all values have dates. The age_calc function doesn't cope well with NaN values. – rgunning Jul 20 '17 at 10:36
  • Hey @rgunning. I cheked all and there are no NaNs or Nas. I've tried playing around witht he line of code and I either get the origin must be supplied error, or this: `Error in if (any(enddate < dob)) { : missing value where TRUE/FALSE needed`. I really am at a loss as to what to do. I tried going back to the original data.frame I used when I asked and it doesn't work for that either. I've looked by eye so to say at the data and can't find anythign weird in it. Any ideas? – Laura Bas Jul 24 '17 at 11:49