I've a large panel dataset in which for many participants, valid observations of a biomarker must be separated from invalid, each forming their own modal distribution.
I wish to apply the diptest for bimodality function to each set of observations, which I have separated using the split function. I need to run the dip.test 1:n times depending on the number of modes in each participants' observations, discarding those observations outside the intervals of the participants' modal, er, lumps. (It's clear which observations belong to the participant using the marker0 variable.) I apply the diptest to all observations for all participants by using the lapply and mutate functions in sequence.
The difficulty comes with repeating the diptest in the central nested function: I wish to exclude observations that each iteration shows not to belong to the participant. However, I haven't found a way to exclude those values in the repeating function (right now I'm just recoding them as x = 1. It'd be a simple case if I were only dealing with one data frame--does anyone know how I can limit my observations with each iteration?
# notional measures
y1 <- rnorm(80,mean=75, sd=3)
y2 <- rnorm(100,mean=100, sd=4)
y3 <- rnorm(40, mean=150, sd=2)
mark <- append(y1,y2)
marker <- append(mark,y3)
df_y <- as.data.frame(marker)
df_y$id <- 1
df_y$marker0 <-100
df_y$lump <- 0
df_y$other <-0
z1 <- rnorm(130,mean=50, sd=2)
z2 <- rnorm(110,mean=125, sd=5)
marker <- append(z1,z2)
df_z <- as.data.frame(marker)
df_z$id <- 2
df_z$marker0 <-130
df_z$lump <- 0
df_z$other <-0
df <- rbind(df_y, df_z)
# my function
trim.others <- function(x,a,b,c) {
repeat {
diptest <- dip.test(x, simulate.p.value = TRUE, B=500)
if (diptest$p.value > 0.1) break
hm <- classIntervals(x, n=2, style="jenks", method="complete")
b[(a < hm$brks[2])] <- 1
b[(a > hm$brks[2])] <- 2
c[(b == 1) & (x > hm$brks[2])] <- 1
c[(b == 2) & (x < hm$brks[2])] <- 1
x <- x[c==0]
}
return(x)
}
Note that the function works if I run it on a single id. Here's the multimodal distribution of the marker in id = 1, followed by the unimodal distribution of y2 in the notional dataset.
df1 <- df[(df$id==1),]
plot(density(df1$marker)
dfa <- trim.others(df1$marker,df1$marker0,df1$lump,df1$other)
plot(density(dfa))
I'd like to run my function trim.others across the entire dataset, with multiple ids. Here's the approach I took.
dfs <- split(df, f = df$id)
myfunct <- function(w,x,y,z) {
repeat {
diptest <- dip.test(x, simulate.p.value = TRUE, B=500)
if (diptest$p.value > 0.1) break
hm <- classIntervals(x, n=2, style="jenks", method="complete")
w[(y < hm$brks[2])] <- 1
w[(y > hm$brks[2])] <- 2
x[(w == 1) & (x > hm$brks[2])] <- 1
x[(w == 2) & (x < hm$brks[2])] <- 1
}
}
library(dplyr)
anotherfunct <- function(ia) {
mutate(ia, value = myfunct(lump,marker,marker0,other))
}
dothefunct <- lapply(df,function(i) {anotherfunct(i)})
Thank you, group.