So I have a massive data set in which I need to find out descriptive statistics for the cases and controls and then be able to compare these. For example I have 997 females and 1139 males but I need to know how many females I have that are cases and how many are controls. Controls = 0 and cases = 1. I want to keep all my other variables but just split them into two groups. I have tried using the split() function, I have tried to create a subset() but I still cant work out how to get it to show me the different groups. I am relatively new to R but need to use it to analyse my masters dissertation data.
Asked
Active
Viewed 80 times
0
-
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example - please share data and code so we can help more efficiently – Mike May 10 '21 at 20:33
-
How do I do that in the simplest way possible? I dont think I can share the whole thing. – Sophie May 10 '21 at 20:39
-
for your data try something like this `dput(head(df,100))`, then copy and paste the output from this function to your question . df in this example is a placeholder for your dataset – Mike May 10 '21 at 20:42
-
1then add your code as well – Mike May 10 '21 at 20:43
-
Okay, so even that is massive. Its a data set of 2136 responses to 153 variables. Variables are mixed in that they are some numbers (eg age) and some factor (eg gender). The data is responses to questionnaires and contains descriptives mostly. – Sophie May 10 '21 at 20:46
-
Code at the minute is pretty basic, eg age<-na.omit(Sophie$mrc1_socde02) or Gender<-Sophie$mrc1_socde01 Gender2<-as.factor(Gender) – Sophie May 10 '21 at 20:48
-
ok, so for your question I would only include the columns that you need to reproduce your question, from your text it seems like you can share the case control column, gender and maybe a few others to demonstrate your point. also the link I shared will help you tremendously – Mike May 10 '21 at 20:49
-
Something like `table(data$sex, data$case_control)` will cross-tabulate different columns for you. – dash2 May 10 '21 at 21:07
1 Answers
0
I don't kow if I understand well, but if you want to split your data based on a condition it is really simple: Since you did not provide any example data, I put an example on a dummy data.frame
:
df <- data.frame(gender=sample(c("M","F"),1000,replace = T),control=sample(c(0,1),1000,replace = T),other.var=runif(1000))
control <- df[df$control==0,]
cases <- df[df$control==1,]
#if you want female control
f.control <- control[control$gender=="F",]
#idem for male control
m.control <- control[control$gender=="M",]
#idem for famale and male cases
f.cases <- cases[cases$gender=="F",]
m.cases <- cases[cases$gender=="M",]

Elia
- 2,210
- 1
- 6
- 18