1

The data below have a number of observation dates for two individuals.

    dat <- structure(list(GenIndID = c("BHS_034", "BHS_034", "BHS_068", 
"BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", 
"BHS_068", "BHS_068"), IndID = c("BHS_034_A", "BHS_034_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A"), Fate = c("Mort", "Mort", 
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive", 
"Alive", "Alive"), SurveyID = c("GYA13-1", "GYA14-1", "GYA13-1", 
"GYA14-1", "GYA14-2", "GYA15-1", "GYA16-1", "GYA16-2", "GYA17-1", 
"GYA17-3", "GYA15-2"), SurveyDt = structure(c(1379570400, 1407477600, 
1379570400, 1407477600, 1409896800, NA, 1462946400, 1474351200, 
1495519200, 1507010400, 1441951200), tzone = "", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, 11L), .Names = c("GenIndID", "IndID", 
"Fate", "SurveyID", "SurveyDt"), class = "data.frame")

  > dat
   GenIndID     IndID  Fate SurveyID   SurveyDt
1   BHS_034 BHS_034_A  Mort  GYA13-1 2013-09-19
2   BHS_034 BHS_034_A  Mort  GYA14-1 2014-08-08
3   BHS_068 BHS_068_A Alive  GYA13-1 2013-09-19
4   BHS_068 BHS_068_A Alive  GYA14-1 2014-08-08
5   BHS_068 BHS_068_A Alive  GYA14-2 2014-09-05
6   BHS_068 BHS_068_A Alive  GYA15-1       <NA>
7   BHS_068 BHS_068_A Alive  GYA16-1 2016-05-11
8   BHS_068 BHS_068_A Alive  GYA16-2 2016-09-20
9   BHS_068 BHS_068_A Alive  GYA17-1 2017-05-23
10  BHS_068 BHS_068_A Alive  GYA17-3 2017-10-03
11  BHS_068 BHS_068_A Alive  GYA15-2 2015-09-11

The SurveyDt column is formatted as an POSIXct time stamp. I am trying to summarize the max date within the GenIndID group with dplyr. In the code below, I use dplyr to create two new columns. For AAA why are <NA> produced for the 2nd individual when the max function uses the na.rm = F argument? For BBB, I want to summarize a max value for alive individuals but get all NA values (recognized as a factor rather than <NA> which is preferred).

dat %>% group_by(GenIndID) %>%
  mutate(AAA =  max(SurveyDt, na.rm = FALSE),
         BBB =  ifelse(Fate == "Alive", max(SurveyDt, na.rm = F), NA)) %>%
  as.data.frame()

GenIndID     IndID  Fate SurveyID   SurveyDt        AAA BBB
1   BHS_034 BHS_034_A  Mort  GYA13-1 2013-09-19 2014-08-08  NA
2   BHS_034 BHS_034_A  Mort  GYA14-1 2014-08-08 2014-08-08  NA
3   BHS_068 BHS_068_A Alive  GYA13-1 2013-09-19       <NA>  NA
4   BHS_068 BHS_068_A Alive  GYA14-1 2014-08-08       <NA>  NA
5   BHS_068 BHS_068_A Alive  GYA14-2 2014-09-05       <NA>  NA
6   BHS_068 BHS_068_A Alive  GYA15-1       <NA>       <NA>  NA
7   BHS_068 BHS_068_A Alive  GYA16-1 2016-05-11       <NA>  NA
8   BHS_068 BHS_068_A Alive  GYA16-2 2016-09-20       <NA>  NA
9   BHS_068 BHS_068_A Alive  GYA17-1 2017-05-23       <NA>  NA
10  BHS_068 BHS_068_A Alive  GYA17-3 2017-10-03       <NA>  NA
11  BHS_068 BHS_068_A Alive  GYA15-2 2015-09-11       <NA>  NA
> 
B. Davis
  • 3,391
  • 5
  • 42
  • 78
  • https://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects – M-- May 14 '19 at 18:38

1 Answers1

1
dat %>% group_by(GenIndID) %>%
  mutate(AAA =  max(SurveyDt, na.rm=T),
         BBB =  as.POSIXct(ifelse(Fate == "Alive", max(SurveyDt, na.rm=T), NA), origin='1970-01-01', na.rm=T)) %>%
  as.data.frame()
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Prem
  • 11,775
  • 1
  • 19
  • 33
  • many thanks @Prem. Why does adding the `as.POSIXct` wrapper for BBB change the results for AAA? – B. Davis Oct 11 '17 at 14:28
  • @B.Davis Glad that it helped! As per your requirement `AAA` column has the maximum `SurveyDt` for each `GenIndID` group and `BBB` column has maximum `SurveyDt` for `Fate == "Alive"` only. Here `as.POSIXct` is added because `ifelse` condition in `BBB` converts the time to UNIX epoch but the desired output is in different format. – Prem Oct 11 '17 at 17:53