I have already searched the Forum for Hours (really) and start to get the faint Feeling that I am slowly going crazy, especially as it appears to me to be a really easily solvable Problem.
What do I want to do?
Basically, I want to simulate clinical data. Specifically, for each Patient (column 1:ID) an arbitrary score (column 3: score), dependant on the assigned Treatment Group (column 2: group).
set.seed(123)
# Number of subjects in study
n_patients = 1000
# Score: Mean and SDs
mean_verum = 70
sd_verum = 20
mean_placebo = 40
sd_placebo = 20
# Allocating to Treatment groups:
data = data.frame(id = as.character(1:n_patients))
data$group[1:(n_patients/2)] <- "placebo"
data$group[(n_patients/2+1):n_patients] <- "verum"
# Attach Score for each treatment group
data$score <- ifelse(data$group == "verum", rnorm(n=100, mean=mean_verum, sd=sd_verum), rnorm(n=100, mean=mean_placebo, sd=sd_placebo))
So far so easy. Now, I wish to 1) calculate a probability of an Event happening (logit function) depending on the score. Then, 2) I want to actually assign an Event, depending on the probability (rbinom).
I want to do this for n different probablities/Events. This is the Code I've used so far:
Calculate probabilities:
a = -1
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE1 <- p1
a = -0.5
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE2 <- p1
…
Assign Events:
data$Abbruch_AE1 <- rbinom(n_patients, 1, data$p_E1)
data$Abbruch_AE2 <- rbinom(n_patients, 1, data$p_E2)
…
Obviously, this is really inefficient, as it would like to easily scale this up or down, depending on how many probabilities/Events I want to simulate.
The Problem is, I simply do not get it, how I can simultaneously a) generate new, single column in the dataframe, where I want to put in the values for each, b) perform the function to assign the probabilities/Events, and c) do this for a number n of different formulas, which have their specific a and b.
I am sure the solution to this Problem is a simple one - what I didn't manage was to do all These Things at once, which is were I would like this to be eventually. I ahve played around with for loops, all to no avail.
Any help would be greatly appreciated!
This how my dataframe Looks like:
structure(list(id = structure(1:3, .Label = c("1", "2", "3"), class = "factor"),
group = c("placebo", "placebo", "placebo"), score = c(25.791868726014,
45.1376741831306, 35.0661624307525), p_AE1 = c(0.677450814266315,
0.633816117436442, 0.656861351663365), p_AE2 = c(0.560226492151216,
0.512153420188678, 0.537265362130761), p_AE3 = c(0.435875409622676,
0.389033483248856, 0.413221988111604), p_AE4 = c(0.319098312196655,
0.278608032377073, 0.299294085148527), p_AE5 = c(0.221332386680766,
0.189789774534235, 0.205762225373345), p_AE6 = c(0.147051201194953,
0.124403316086538, 0.135795233451071), p_AE7 = c(0.0946686004658072,
0.0793379289917946, 0.0870131973838217), p_AE8 = c(0.0596409872667201,
0.0496714832182721, 0.0546471270895262), AbbruchAE1 = c(1L,
1L, 1L), AbbruchAE2 = c(1L, 1L, 0L), AbbruchAE3 = c(0L, 0L,
0L), AbbruchAE4 = c(0L, 1L, 0L), AbbruchAE5 = c(1L, 0L, 0L
), AbbruchAE6 = c(1L, 0L, 0L), AbbruchAE7 = c(0L, 0L, 0L),
AbbruchAE8 = c(0L, 0L, 0L)), .Names = c("id", "group", "score", "p_AE1", "p_AE2", "p_AE3", "p_AE4", "p_AE5", "p_AE6", "p_AE7", "p_AE8", "AbbruchAE1", "AbbruchAE2", "AbbruchAE3", "AbbruchAE4", "AbbruchAE5", "AbbruchAE6", "AbbruchAE7", "AbbruchAE8"), row.names = c(NA, 3L), class = "data.frame")