The reason why "n" was missing is because that value is kept as numSummaryObj$n
, while the other exploratory values are kept as numSummaryObj$table
.
Putting it back requires a simple cbind
or data.frame
command:
file <- "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/ToothGrowth.csv"
toothGrowth <- read.table(file, header=T, sep=",", row.names=1, na.strings="NA", dec=".", strip.white=TRUE)
numSumTooth <- RcmdrMisc::numSummary(toothGrowth[, c("len", "dose")])
nST <- data.frame(numSumTooth$table, numSumTooth$n)
names(nST) <- c(colnames(numSumTooth$table), "n")
write.csv(nST, "numSumTooth.csv")
==
EDIT:
I would personally invest sometime in data-handling with packages like dplyr
and tidyr
, as they give you a lot of mileage and flexibility in future. For instance, in order to generate the same numSummary in a data.frame, you can run the following:
toothGrowth %>%
select(-supp) %>%
gather(var, val) %>% #convert the wide data frame into the long-form, with var = dose and len
group_by(var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
# A tibble: 2 × 10
var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
2 len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60
The added flexibility in this approach is that you can choose to find mean for each group (like supp
in this case):
toothGrowth %>%
# select(-supp) %>%
gather(var, val, -supp) %>%
group_by(supp, var) %>%
summarise(mean = mean(val), sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = n())
Source: local data frame [4 x 11]
Groups: supp [?]
supp var mean sd IQR `0%` `25%` `50%` `75%` `100%` n
<fctr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 OJ dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
2 OJ len 20.663333 6.6055610 10.2 8.2 15.525 22.7 25.725 30.9 30
3 VC dose 1.166667 0.6342703 1.5 0.5 0.500 1.0 2.000 2.0 30
4 VC len 16.963333 8.2660287 11.9 4.2 11.200 16.5 23.100 33.9 30
==
Another alternative (if you feel that writing the long summarise syntax repeatedly is a chore) is to create a function, e.g.:
checkVar <- function(varname, data){
val <- data[, varname]
tmp <- data.frame(mean = mean(val),
sd = sd(val),
IQR = IQR(val),
`0%`= min(val),
`25%` = quantile(val, 0.25),
`50%` = median(val),
`75%` = quantile(val, .75),
`100%` = max(val),
n = length(val))
names(tmp) <- c("mean", "sd", "IQR", "`0%`", "`25%`", "`50%`", "`75%`", "`100%`", "n")
rownames(tmp) <- varname
return(tmp)
}
Executing the custom function would give you summary statistics:
checkVar("dose", ToothGrowth)
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.5 1 2 2 60
And putting them into a single data.frame involves an apply function, e.g. with lapply
:
do.call(rbind, lapply(c("dose", "len"), checkVar, data=ToothGrowth))
mean sd IQR `0%` `25%` `50%` `75%` `100%` n
dose 1.166667 0.6288722 1.5 0.5 0.500 1.00 2.000 2.0 60
len 18.813333 7.6493152 12.2 4.2 13.075 19.25 25.275 33.9 60