it seems you want us to do all the steps of an initial exploratory data analysis for you. On your next postings, instead of requesting coding like this, you should first show your problems with reproducible code, show the results of your attempts, and ask specific questions about your doubts. That said, lets look at your question:
You can use apply in loops to return median, mean, Q1 and Q3 for every column.
sapply(yourdataframe, median) #will return a vector with the medians of every column
Similarly,
sapply(yourdataframe, quantile, 0.25) #will return a vector with all the first quartiles
sapply(yourdataframe, quantile, 0.75) #will return a vector with all the third quartiles
You may want to write a function that integrates all that in a single call, like this:
descriptive<-function(x=data.frame(), digits=2, na.rm=TRUE, normality_test="shapiro"){
library(stats)
is.normal<-character()
medians<-numeric()
Q1<-numeric()
Q3<-numeric()
means<-numeric()
SDs<-numeric()
output<-character()
for (i in seq_along(x)){
if (is.numeric(x[,i])){
medians[i]<-median(x[,i], na.rm = na.rm)
Q1[i]<-quantile(x[,i], 0.25, na.rm = na.rm)
Q3[i]<-quantile(x[,i], 0.75, na.rm = na.rm)
means[i]<-round(mean(x[,i], na.rm = na.rm), digits = digits)
SDs[i]<-round(sd(x[,i], na.rm=TRUE), digits = digits)
if (normality_test=="shapiro"){
p.value<-shapiro.test(x[,i])$p.value
} else if (normality_test=="ks"){
p.value<-ks.test(x[,i], "pnorm", means[i], SDs[i])$p.value
}
if (p.value<=0.05){
is.normal[i]<-FALSE
output[i]<-paste0(medians[i], " (", Q1[i], "-", Q3[i], ")")
}else{
is.normal[i]<-TRUE
output[i]<-paste0(means[i], " +-", SDs[i])
}
}else {
is.normal[i]<-NA
means[i]<-NA
medians[i]<-NA
Q1[i]<-NA
Q3[i]<-NA
SDs[i]<-NA
output[i]<-NA
}
}
df<-data.frame(rbind( "normal distr"=is.normal, "median"=medians, "Q1"=Q1, "Q3"=Q3, "mean"=means, "SD"=SDs, "output"=output))
names(df)<-colnames(x)
df
}
As an example:
> descriptive(iris, normality_test="shapiro")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
normal distr FALSE TRUE FALSE FALSE <NA>
median 5.8 3 4.35 1.3 <NA>
Q1 5.1 2.8 1.6 0.3 <NA>
Q3 6.4 3.3 5.1 1.8 <NA>
mean 5.84 3.06 3.76 1.2 <NA>
SD 0.83 0.44 1.77 0.76 <NA>
output 5.8 (5.1-6.4) 3.06 +-0.44 4.35 (1.6-5.1) 1.3 (0.3-1.8) <NA>
There are several ways to subset your data based on categorical values for analysis, check dplyr's filter and group_by functions.