5

I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.

One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.

Exp      Day4   Day6    Day8    Day10   Day12   Day14    ...
A        0      0       0       2       3       1        ...

I make the example using this:

wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")

Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.

Exp     Deaths  Day     
A       0       4    
A       0       6
A       0       8
A       2       10
A       3       12
..      ..      ..

To make this example:

df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
    colnames(df2)<-c("Exp","Deaths","Day")

What I would like to do is perform a Gompertz Analysis (See second paragraph of "the life table" here). The equation is:

μx = α*e β*x

Where μx is probability of death at a given time, α is initial mortality rate, and β is the rate of aging.

I would like to be able to get a dataframe which has α and β estimates for each of my ~640 genotypes for further analysis later.

I need help going from the above dataframes to an output of these values for each of my genotypes in R.

I have looked through the package flexsurv which may house the answer but I have failed in attempts to find and implement it.

rg255
  • 4,119
  • 3
  • 22
  • 40
  • If there are only 2 parameters it should not be hard to find the 'best' fit. You just need to choose your definition of 'best'. – Dennis Jaheruddin Jul 18 '13 at 08:44
  • I think you might find package [flexsurv](http://cran.r-project.org/web/packages/flexsurv/index.html) useful. – Roland Jul 18 '13 at 08:49
  • I think to achieve a reasonable fit, much more data would be needed than what you provide in your question. So please provide a larger dataset. – Roland Jul 18 '13 at 09:04
  • @Roland More genotypes or more individuals & time per genotype? – rg255 Jul 18 '13 at 09:06
  • @GriffinEvo The latter. – Roland Jul 18 '13 at 09:07
  • @Roland Done, added code to question, now runs to 36 days and >50 individuals. – rg255 Jul 18 '13 at 09:10
  • I think you need to know how many subjects are observed. Otherwise, how would you calculate a probability of death? If you have that, you need to construct a survival object (`?Surv`) and can then use `flexsurvreg` with the Gompertz distribution. – Roland Jul 18 '13 at 09:33
  • @Roland I'm not sure how to get from my dataframes to that Surv object. I can get the number of individuals per genotype, do I then make a dataframe with columns of 'genotype', 'number born', 'number dead', 'days passed' - a bit like my second example but with a born column putting the number of flies in to each genotype on day 0? – rg255 Jul 18 '13 at 14:37
  • I suggest to ask that as a separate question. I've never done survival analyses and don't feel confident enough to give further advice. I had to look up what "right-censored" means ... – Roland Jul 18 '13 at 14:42

1 Answers1

3

This should get you started...

Firstly, for the flexsurvreg function to work, you need to specify your input data as a Surv object (from package:survival). This means one row per observation.

The first thing is to re-create the 'raw' data from the summary tables you provide. (I know rbind is not efficient, but you can always switch to data.table for large sets).

### get rows with >1 death
df3 <- df2[df2$Deaths>1, 2:3]
### expand to give one row per death per time
df3 <- sapply(df3, FUN=function(x) rep(df3[, 2], df3[, 1]))
### each death is 1 (occurs once)
df3[, 1] <- 1
### add this to the rows with <=1 death
df3 <- rbind(df3, df2[!df2$Deaths>1, 2:3])
### convert to Surv object
library(survival)
s1 <- with(df3, Surv(Day, Deaths))
### get parameters for Gompertz distribution
library(flexsurv) 
f1 <- flexsurvreg(s1 ~ 1, dist="gompertz")

giving

> f1$res
              est         L95%        U95%
shape 0.165351912 0.1281016481 0.202602176
rate  0.001767956 0.0006902161 0.004528537

Note that this is an intercept-only model as all your genotypes are A. You can loop this over multiple survival objects once you have re-created the per-observation data as above.

From the flexsurv docs:

Gompertz distribution with shape parameter a and rate parameter b has hazard function

H(x: a, b) = b.e^{ax}

So it appears your alpha is b, the rate, and beta is a, the shape.

dardisco
  • 5,086
  • 2
  • 39
  • 54