0

I have already searched the Forum for Hours (really) and start to get the faint Feeling that I am slowly going crazy, especially as it appears to me to be a really easily solvable Problem.

What do I want to do?

Basically, I want to simulate clinical data. Specifically, for each Patient (column 1:ID) an arbitrary score (column 3: score), dependant on the assigned Treatment Group (column 2: group).

set.seed(123)

# Number of subjects in study
n_patients = 1000

# Score: Mean and SDs

mean_verum = 70
sd_verum = 20

mean_placebo = 40
sd_placebo = 20

# Allocating to Treatment groups: 

data = data.frame(id = as.character(1:n_patients))
data$group[1:(n_patients/2)] <- "placebo"
data$group[(n_patients/2+1):n_patients] <- "verum"

# Attach Score for each treatment group
data$score <- ifelse(data$group == "verum", rnorm(n=100, mean=mean_verum, sd=sd_verum), rnorm(n=100, mean=mean_placebo, sd=sd_placebo))

So far so easy. Now, I wish to 1) calculate a probability of an Event happening (logit function) depending on the score. Then, 2) I want to actually assign an Event, depending on the probability (rbinom).

I want to do this for n different probablities/Events. This is the Code I've used so far:

Calculate probabilities:

a = -1
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE1 <- p1

a = -0.5
b = 0.01
p1 = 1-exp(a+b*data$score)/(1+exp(a+b*data$score))
data$p_AE2 <- p1

…

Assign Events:

data$Abbruch_AE1 <- rbinom(n_patients, 1, data$p_E1)
data$Abbruch_AE2 <- rbinom(n_patients, 1, data$p_E2)
…

Obviously, this is really inefficient, as it would like to easily scale this up or down, depending on how many probabilities/Events I want to simulate.

The Problem is, I simply do not get it, how I can simultaneously a) generate new, single column in the dataframe, where I want to put in the values for each, b) perform the function to assign the probabilities/Events, and c) do this for a number n of different formulas, which have their specific a and b.

I am sure the solution to this Problem is a simple one - what I didn't manage was to do all These Things at once, which is were I would like this to be eventually. I ahve played around with for loops, all to no avail.

Any help would be greatly appreciated!

This how my dataframe Looks like:

structure(list(id = structure(1:3, .Label = c("1", "2", "3"), class = "factor"), 
group = c("placebo", "placebo", "placebo"), score = c(25.791868726014, 
45.1376741831306, 35.0661624307525), p_AE1 = c(0.677450814266315, 
0.633816117436442, 0.656861351663365), p_AE2 = c(0.560226492151216, 
0.512153420188678, 0.537265362130761), p_AE3 = c(0.435875409622676, 
0.389033483248856, 0.413221988111604), p_AE4 = c(0.319098312196655, 
0.278608032377073, 0.299294085148527), p_AE5 = c(0.221332386680766, 
0.189789774534235, 0.205762225373345), p_AE6 = c(0.147051201194953, 
0.124403316086538, 0.135795233451071), p_AE7 = c(0.0946686004658072, 
0.0793379289917946, 0.0870131973838217), p_AE8 = c(0.0596409872667201, 
0.0496714832182721, 0.0546471270895262), AbbruchAE1 = c(1L, 
1L, 1L), AbbruchAE2 = c(1L, 1L, 0L), AbbruchAE3 = c(0L, 0L, 
0L), AbbruchAE4 = c(0L, 1L, 0L), AbbruchAE5 = c(1L, 0L, 0L
), AbbruchAE6 = c(1L, 0L, 0L), AbbruchAE7 = c(0L, 0L, 0L), 
AbbruchAE8 = c(0L, 0L, 0L)), .Names = c("id", "group", "score",  "p_AE1", "p_AE2", "p_AE3", "p_AE4", "p_AE5", "p_AE6", "p_AE7",  "p_AE8", "AbbruchAE1", "AbbruchAE2", "AbbruchAE3", "AbbruchAE4",  "AbbruchAE5", "AbbruchAE6", "AbbruchAE7", "AbbruchAE8"), row.names = c(NA,  3L), class = "data.frame")
Yves
  • 1
  • 1
  • Yves, it really helps to have representative data to start with. Please make this question *reproducible* with sample/representative data (e.g., `dput(head(data))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Oct 27 '18 at 15:14
  • Hi Yves, check R's `data.table` package [here](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) – Pedro Schuller Oct 28 '18 at 12:37

0 Answers0