0

I have a data frame in R consisting of LOS and multiple broader conditions

LOS             Condition
  1                Spinal
  2               Urology
  1              Thoracic
  8                Spinal
  5               Billary
 ...                  ...

I'd like to find the variance of the LOS for each of the broader conditions, is there simple way to do this?

Any advice would be appreciated, thanks!

Reproducible similar dataset below

data <- structure(list(LOS = c(6, 6, 13, 6, 19, 7), Condition = structure(c(37L, 15L, 24L, 15L, 15L, 15L), .Label = c("Acute Liver Failure", "Aortic Disease", "Arthritis and Limb Deformity/Fractures", "Asphyxiation", "Billary", "Bowel Infection/Perforation/Infarction", "Breast Cancer", "Cancer (Unoperated)", "Cardiac Arrest", "Cardiac Arythmia", "Cerebral Aneurysm (Non-Ruptured)", "Cerebral Infarction", "Cerebral Oedema", "Chronic Liver Disease", "COPD/Asthma/Respiratory Failure", "Drug Overdose and Poisoning", "Ear/Nose/Throat", "Electrolyte", "Encephalitis", "Endocrine", "Epilepsy", "Gastroectomy", "Gynaecological Cancer/Surgery", "Heart Failure", "Hydrocephalus", "Hyperventilation Syndromes", "Infection incl. unspecified", "Influenza", "Interstitial Pulmonary Disease", "Large Bowel Cancer", "Max Fax Surgeries", "Meningitis", "Myocardial Infarction", "Neuro-Surgical Cancer", "Obesity", "Other Inter-Cerebral Haemmorhage", "Pancreatitis", "Perforation of Oesophagus", "Peripheral Vascular Disease (Inlc. Ischaemia and Infarction", "Pleural Effusion", "Pneumonia", "Psychiatric", "Pulmonary/Veno-Thrombo Embollism", "Skin Inflammation/Infection", "Skull and Facial Fractures", "Spinal Cord Weakness", "Spinal Surgery/Fractures", "Spinal Trauma", "Sub-Arachnoid Haemmorhage", "Systemic Weakness", "Thoracic/Abdominal Aortic Aneurysm (Non-Ruptured)", "Thoracic/Abdominal Aortic Aneurysm (Ruptured incl. injury)", "Trauma to Intra-Abdominal Organs/Vessels", "Trauma to Thoracic Cage", "Traumatic Inter-Cerebral Haemmorhage/Contusions/Oedema", "Urology/Renal Surgery" ), class = "factor")), .Names = c("LOS", "Condition"), row.names = c(NA, 6L), class = "data.frame")

Rowley058
  • 83
  • 6
  • 1
    `aggregate(LOS~Condition, data=data, FUN=var)` should work or `data[, .(var(LOS)), by="Condition"]` for a data.table. – lmo Aug 04 '16 at 11:21
  • Of course, yes this is right. Thankyou. – Rowley058 Aug 04 '16 at 11:32
  • 1
    Possible duplicate of [Aggregate a dataframe on a given column and display another column](http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column) – Sotos Aug 04 '16 at 11:41

1 Answers1

0

This creates a new data.frame with the results:

res <- data.frame(condition = factor(, levels = levels(data$Condition)), varLos = numeric(0))
for (i in unique(data$Condition)){
  res[nrow(res) + 1,] <- c(as.character(i), var(data[data$Condition == i, "LOS"], na.rm = T))
}
res
#                         condition           varLos
# 1                    Pancreatitis             <NA>
# 2 COPD/Asthma/Respiratory Failure 40.3333333333333
# 3                   Heart Failure             <NA>

The NA values are introduced since there is no variance with only one value. With your data set (which obviously holds more observations) these should not be created.

loki
  • 9,816
  • 7
  • 56
  • 82
  • This works great, and thankyou, however @lmo's code is a little easier to use. I forgot about aggregate completely – Rowley058 Aug 04 '16 at 11:41