2

I've a large panel dataset in which for many participants, valid observations of a biomarker must be separated from invalid, each forming their own modal distribution.

I wish to apply the diptest for bimodality function to each set of observations, which I have separated using the split function. I need to run the dip.test 1:n times depending on the number of modes in each participants' observations, discarding those observations outside the intervals of the participants' modal, er, lumps. (It's clear which observations belong to the participant using the marker0 variable.) I apply the diptest to all observations for all participants by using the lapply and mutate functions in sequence.

The difficulty comes with repeating the diptest in the central nested function: I wish to exclude observations that each iteration shows not to belong to the participant. However, I haven't found a way to exclude those values in the repeating function (right now I'm just recoding them as x = 1. It'd be a simple case if I were only dealing with one data frame--does anyone know how I can limit my observations with each iteration?

# notional measures

y1 <- rnorm(80,mean=75, sd=3)
y2 <- rnorm(100,mean=100, sd=4)
y3 <- rnorm(40, mean=150, sd=2)
mark <- append(y1,y2)
marker <- append(mark,y3)
df_y <- as.data.frame(marker)
df_y$id <- 1
df_y$marker0 <-100
df_y$lump <- 0
df_y$other <-0

z1 <- rnorm(130,mean=50, sd=2)
z2 <- rnorm(110,mean=125, sd=5)
marker <- append(z1,z2)
df_z <- as.data.frame(marker)
df_z$id <- 2
df_z$marker0 <-130
df_z$lump <- 0
df_z$other <-0

df <- rbind(df_y, df_z)

# my function

trim.others <- function(x,a,b,c) {
repeat {
  diptest <- dip.test(x, simulate.p.value = TRUE, B=500)
  if (diptest$p.value > 0.1) break
  hm <- classIntervals(x, n=2, style="jenks", method="complete")
  b[(a < hm$brks[2])] <- 1
  b[(a > hm$brks[2])] <- 2
  c[(b == 1) & (x > hm$brks[2])] <- 1
  c[(b == 2) & (x < hm$brks[2])] <- 1
  x <- x[c==0]
  }
  return(x)
}

Note that the function works if I run it on a single id. Here's the multimodal distribution of the marker in id = 1, followed by the unimodal distribution of y2 in the notional dataset.

df1 <- df[(df$id==1),]
plot(density(df1$marker)

dfa <- trim.others(df1$marker,df1$marker0,df1$lump,df1$other)
plot(density(dfa))

I'd like to run my function trim.others across the entire dataset, with multiple ids. Here's the approach I took.

dfs <- split(df, f = df$id)

myfunct <- function(w,x,y,z) {
  repeat {
    diptest <- dip.test(x, simulate.p.value = TRUE, B=500)
    if (diptest$p.value > 0.1) break
    hm <- classIntervals(x, n=2, style="jenks", method="complete")
    w[(y < hm$brks[2])] <- 1
    w[(y > hm$brks[2])] <- 2
    x[(w == 1) & (x > hm$brks[2])] <- 1
    x[(w == 2) & (x < hm$brks[2])] <- 1
     }
}

library(dplyr)

anotherfunct <- function(ia) {
  mutate(ia, value = myfunct(lump,marker,marker0,other))
}

dothefunct <- lapply(df,function(i) {anotherfunct(i)})

Thank you, group.

  • Including a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) would make it easier for others to help you. – Jaap Sep 24 '17 at 11:02
  • Will do, merci! –  Sep 24 '17 at 20:53

1 Answers1

0

I'd use purrr and do your splitting inside the loop:

library(dplyr)
library(purrr)

trim.others2  <- function(mydf) {

  x  <- mydf$marker
  a  <- mydf$marker0
  b  <- mydf$lump
  c  <- mydf$other

  repeat {

    diptest <- dip.test(x, simulate.p.value = TRUE, B=500)

    if (diptest$p.value > 0.1) break

    hm <- classIntervals(x, n=2, style="jenks", method="complete")
    b[(a < hm$brks[2])] <- 1
    b[(a > hm$brks[2])] <- 2
    c[(b == 1) & (x > hm$brks[2])] <- 1
    c[(b == 2) & (x < hm$brks[2])] <- 1
    x <- x[c==0]

  }
  return(x)
}

all_ids  <- unique(df$id) 

list_of_results  <- map(all_ids, ~df %>% filter(id == .x) %>% trim.others2 ) %>%
    setNames(all_ids)
crazybilly
  • 2,992
  • 1
  • 16
  • 42
  • Thank you very much, Crazybilly. I stepped away from this task for a bit, but I can try it now. Sorry for the delay. –  Apr 04 '18 at 22:02
  • 1
    Sounds good. If it works for you, please accept it as an answer! Thanks! – crazybilly Apr 05 '18 at 01:16