2

The question I am posting here is closely linked to another question I posted two days ago about gompertz aging analysis.

I am trying to construct a survival object, see ?Surv, in R. This will hopefully be used to perform Gompertz analysis to produce an output of two values (see original question for further details).

I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.

One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.

Exp      Day4   Day6    Day8    Day10   Day12   Day14    ...
A        0      0       0       2       3       1        ...

I make the example using this:

wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")

Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.

Exp     Deaths  Day     
A       0       4    
A       0       6
A       0       8
A       2       10
A       3       12
..      ..      ..

To make this example:

df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
    colnames(df2)<-c("Exp","Deaths","Day")

Each genotype has approximately 50 flies in it. What I need help with now is how to go from one of the above dataframes to a working survival object. What does this object look like? And how do I get from the above to the survival object smoothly?

Community
  • 1
  • 1
rg255
  • 4,119
  • 3
  • 22
  • 40
  • Are there any censored observations? If so, how many are censored at each day? – gung - Reinstate Monica Jul 20 '13 at 13:58
  • @gung what is meant by censored in a Survival object? this is one of the things that confused me! – rg255 Jul 20 '13 at 16:21
  • Imagine you want to know how long, on average, flies will live. So you get a bunch of flies that have just been born (hatched?) and watch them for up to 30 days. Some flies die on day 4, others day 10, etc. By day 30, only 1 fly is still alive. All you know about how long that fly lived is that it was still alive on day 30, so it's lifespan was *greater than* 30 days, but you don't know the actual number. That fly was *censored* at day 30. – gung - Reinstate Monica Jul 20 '13 at 16:57
  • 1
    You said that the experiment lasted 98 days but the last event was on day 38. Are we supposed to assume that there was observation through day 98 and no further deaths? Or are we supposed to assume the total number under observations was 55 in this case, which would imply that you had no censoring, and that all the flies were dead at the end of the experiment. – IRTFM Jul 20 '13 at 18:35
  • @dwin this is just a dummy data set, in the real data it is generally 50 flies per genotype but some were lost / killed etc. and all 32000 flies died within 98 days, mean was 60 days – rg255 Jul 21 '13 at 08:14
  • @gung all flies were observed until death so no censoring – rg255 Jul 21 '13 at 08:16
  • If all deaths were observed (so there was no censoring), then there is no need to add in extra lines of data to represent the censoring events. If there were censoring, you would need an extra line of data for the time of each censoring-occurrence. – IRTFM Jul 21 '13 at 16:24

1 Answers1

4

After noting the total of Deaths was 55 and you said that the number of flies was "around 50", I decided the likely assumption was that this was a completely observed process. So you need to replicate the duplicate deaths so there is one row for each death and assign an event marker of 1. The "long" format is clearly the preferred format. You can then create a Surv-object with the 'Day' and 'event'

?Surv
df3 <- df2[rep(rownames(df2), df2$Deaths), ]
str(df3)
#---------------------
'data.frame':   55 obs. of  3 variables:
 $ Exp   : Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
 $ Deaths: num  2 2 3 3 3 1 3 3 3 4 ...
 $ Day   : num  10 10 12 12 12 14 16 16 16 18 ...
#----------------------
df3$event=1
str(with(df3, Surv(Day, event) ) )
#------------------
 Surv [1:55, 1:2] 10  10  12  12  12  14  16  16  16  18  ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "time" "status"
 - attr(*, "type")= chr "right"

Note: If this were being done in the coxph function, the expansion to individual lines of date might not have been needed, since that function allows specification of case weights. (I'm guessing that the other regression function in the survival package would not have needed this to be done either.) In the past Terry Therneau has expressed puzzlement that people are creating Surv-objects outside the formula interface of the coxph. The intended use of htis Surv-object was not described in sufficient detail to know whether a weighted analysis without exapnsion were possible.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Hi, what would you do if this wasn't a completely observed process (i.e., at the end of the experiments there were still flies alive)? thank you! – uller Jun 10 '17 at 01:43
  • I should clarify that I am not endorsing the construction of `Surv`-objects outside of the regression function calls. The method endorsed by Therneau is to use a `Surv`-call on the LHS of the formula-call and to supply a dataframe to the `data` argument. – IRTFM Jun 10 '17 at 16:22
  • The survival package has multiple functions for addressing censored data situations. If you were dealing with multiple groups that had different survival processes, you would need more levels for the factor variable. Your final entry for each group would have an event variable that was 1 for all times except the last "period" which would be 0 and a Deaths entry that was the number of of flies alive at the end of the experiment. (As explained in one of my comments ... 4 years ago. ) – IRTFM Jun 10 '17 at 16:23