I have a dataset with the following structure:
Variable "Class" = 1,..,50 each class has multiple observations: from 2000 (#obs in class1) to 200(#obs in class 50) variables Age, Sex, HIV for each individual in each class
What I have to do is to create data from this original dataset in a way that each row shows the variable "Class" (50 rows on the other hand instead of something around 10000 rows that I have for the original dataset) and with the variables you see.
Im new to R, so Im not sure how I can squeeze(?!) the data in a way that for example row 1 shows class 1 but with the information of Age and Sex and HIV for 2000 individuals!
I need this new dataset because I am writing a function (a glm) and the source of data for that function should not be the original data, it should be based on classes! But the predictions of this glm will be on the individual level! (on the original data)
Can anyone kidnly give me a hand or hint on this?
Here is a mini-scale of data looks like:
library(simstudy)
Class <- defData(varname = "Class", dist = "categorical", formula = "0.8;0.2", id="Class1")
Class <- defData(Class, varname = "Classic", dist = "categorical", formula = "0.8;0.2")
Class <- defData(Class, varname = "clustersize",dist = "normal", formula = "5", variance = 0)
d1 <- genData(1, Class) #'
d1
dF1 <- genCluster(d1, cLevelVar = "Class", numIndsVar = "clustersize", level1ID = "Class1")
dF1
Class2<- defData(varname = "Class", dist = "categorical", formula = "0.3;0.2;0.1;0.3;0.1", id="Class1")
Class2 <- defData(Class2, varname = "Classic", dist = "categorical", formula = "0.3;0.2;0.1;0.3;0.1")
Class2 <- defData(Class2, varname = "clustersize",dist = "noZeroPoisson", formula = "3")
d2 <- genData(3, Class2) #'
d2
dF2 <- genCluster(d2, cLevelVar = "Class", numIndsVar = "clustersize", level1ID = "Class1")
dF2
d<-rbind(dF1,dF2)
v <- defDataAdd( varname = "Age", dist = "normal", formula = "20", variance = 10)
v <- defDataAdd(v, varname = "Sex", dist = "binary", formula = "0.4", link = "logit")
v <- defDataAdd(v, varname = "HIV", dist = "binary", formula = "0.7", link = "logit")
d <- addColumns(v, d)
Y<- defDataAdd( varname = "Y", dist = "binary", formula = "0.1*Age+0.2*Sex+0.5*HIV", link = "logit")
d <- addColumns(Y, d)
d
Let's put it this way. "d" is the original dataset I have, with 16 rows( individuals) according to the code I gave. Now I want to model Y by Age, Sex, HIV but the data that this model should be using, is not "d", it has to be a new data set extracting from "d" in a way that I end up with 3 rows (because I have 3 classes). So my confusion is how can I do that (create a new dataset from d) when I have 11 individuals in class 1, 2 individuals in class 2, 3 individuals in class 3. So I will run the model in this new data set, and will predict it in the original dataset "d"