2

Whenever I use survfit in R I get different values for n and strata: For example I get n: 150, 167 (add up to 317 which is the total input) strata: 149, 163

From the help page ?survival::survfit.object:

n = total number of subjects in each curve.

strata = if there are multiple curves, this component gives the number of elements of the time etc. vectors corresponding to the first curve, the second curve, and so on. The names of the elements are labels for the curves.

I don't understand why the numbers are different.

EDIT: I did think about the issue being the repeated time data points, as you can see in the example database there are 9 instances of duplicate values (18 in total). This would mean only 317 - 9 = 308 values are used. But strata adds up to: 149+163=312, not 308. The code used is:

library(survival)
library(survminer)
survival <- surv_fit(Surv(time = Time,event = Event)~Group,data=x, conf.int=0.95)

Update: It is to do with repeated times, within each group. If I separate the data in group A and group B there is 1 duplicate event in group A and 4 duplicate events in froup B. Therefore there would be 317 - 1 - 4 = 312 time points in the plot.

And in each group it would be: A: 150 - 1 = 149 B: 167 - 4 = 163

As strata shows.

Agustin
  • 1,458
  • 1
  • 13
  • 30
  • 1
    A reproducible example would help to clarify this. However, reading the description I would assume that you have multiple events with the same time. – kath Nov 07 '19 at 11:18
  • Have updated with an example – Agustin Nov 07 '19 at 11:44
  • 1
    Please do not post data as an external link use `dput` or similar for example. You can read more on this [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – kath Nov 07 '19 at 11:53
  • 1
    I still think it is the duplicate times per curve! So although you have more duplicated times in your complete data, when this is split up only some of them remain duplicated. – kath Nov 07 '19 at 11:53
  • Yes that is correct. I just tried that. DO you want to add it as an answer? – Agustin Nov 07 '19 at 11:54
  • 1
    You can answer it yourself now ;) – kath Nov 07 '19 at 11:55
  • I will also have a look at the proper way of adding data, I'd never done it before. Thanks! – Agustin Nov 07 '19 at 11:56

1 Answers1

0

Thank you to @kath for their help.

n refers to how many samples are in each group.

strata refers to the number distinct time elements in each group, i.e. removing duplicates within each group.

Community
  • 1
  • 1
Agustin
  • 1,458
  • 1
  • 13
  • 30