Essentially the assignment is to find the SD, mean, p-value, and the number of people within each category of a categorical variable from a continuous variable.
As an example, having a variable BMI (continuous) which has the BMI of patients but the assignment asks us to find the mean and sd of variable BMI within the "No diabetes" group and "Diabetes" group of the same categorical variable.
The first variable is a list of BMI's per patient, the second variable indicates if the individual has BMI or not, 1 and 2 is for type 1 and 2 diabetes and 3 is for no diabetes.
My assignment is to get the p-value, amount of individuals, mean, and standard deviation of individuals in BMI that have diabetes and individuals in BMI without diabetes while removing anyone with missing information.
I have tried:
mean(ds$bmi[ds$diabetesI==1|ds$diabetesI==2])
However, this returns NA. My thought behind this was to see if I could get the mean for individuals with type 1 and 2 diabetes but as stated above, it did not work.
data
ds <- structure(list(bmi_list = c(23.56748874, 30.2897933, 26.79150092,
29.52347213, 32.60591716, 35.04961743, 21.41223797, 27.46530314,
28.73467206, 21.19391994, 25.59362916, 27.62345679, 34.45651021,
27.48650005, 31.49548668, 26.05817112, 35.83864796, 31.42131479,
22.49134948, 33.99585346, 23.67125363, 22.55335653, 29.41248346,
32.94855347, 23.2915562, 30.37962963, 23.759308, 25.2493372,
29.27315022, 35.26197253), diab4 = c(1L, 1L, 3L, 1L, 1L, 3L,
1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 1L,
3L, 1L, 3L, 1L, 1L, 1L, 1L, 3L)), row.names = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 30L, 31L, 32L), class =
"data.frame")