1

This is a followup question to my original "Count Peaks with R" question.

In this question I want to use standard deviations to derive thresholds per person in the dataset...

Here is fake data to make my everything reproducible:

set.seed(9494)
Happiness <- round(runif(100, -100, 100))
ID <- rep(c("ID1", "ID2", "ID3", "ID4", "ID5"), 20)
Stimuli <- rep(1:4, 1)
DF <- data.frame(ID, Stimuli, Happiness)

Explanation: Dataframe "DF" is a summary of 5 people that each looked at 3 different stimuli. Happiness is the emotion that they experienced while looking at the stimuli for a certain period of time (in the dataframe each row is a different portion of 1 second)

Thanks to dcarlson's input on my previous question, this question builds on counting peaks in R by taking the standard deviation of each ID's respective Happiness across all stimuli watched.

Step 1: calculate thresholds per person (DF$ID)

First, I want to split the dataframe DF by ID:

#split dataframe per ID AND stimuli
DF.id <- split(DF, DF$ID)

Next I created this function to calculate the thresholds per person (ID):

TR_SD <- function(Y){
  SD1_thresh <- mean(Y) + (1*sd(Y))
  SD2_thresh <- mean(Y) + (2*sd(Y))
  SD3_thresh <- mean(Y) + (3*sd(Y))
  SD1_neg_thresh <- mean(Y) - (1*sd(Y))
  SD2_neg_thresh <- mean(Y) - (2*sd(Y))
  SD3_neg_thresh <- mean(Y) - (3*sd(Y))
  return(cbind(SD1_thresh, SD2_thresh, SD3_thresh, SD1_neg_thresh, SD2_neg_thresh, SD3_neg_thresh))
}

SD.Thresh <- lapply(DF.id, function(ID) TR_SD(ID$Happiness))
SD.Thresh

Step 2: Now the function that determines whether each Happiness value is above (TRUE = 1) or below (FALSE = 0) the above thresholds to count the peaks:

#function to create matrix that analyzes Happiness based on threshholds
Thresh <- function(X) {
  H_peaks_1a <- ifelse(X >= SD1_thresh ,1,0)
  H_peaks_2a <- ifelse(X >= SD2_thresh,1,0)
  H_peaks_3a <- ifelse(X >= SD3_thresh,1,0)
  H_neg_peaks_1a <- ifelse(X <= SD1_neg_thresh ,1,0)
  H_neg_peaks_2a <- ifelse(X <= SD2_neg_thresh ,1,0)
  H_neg_peaks_3a <- ifelse(X <= SD3_neg_thresh ,1,0)
  return(cbind(H_peaks_1a, H_peaks_2a, H_peaks_3a, H_neg_peaks_1a, H_neg_peaks_2a, H_neg_peaks_3a))
}

#run function
H_peaks.ID <- lapply(DF.id, function(ID) Thresh(ID$Happiness))

Question: how can I use the threshold results out of SD.Thresh per person (ID) as individual thresholds to apply to each respective ID to then count the peaks?

Use the ID1 results here... enter image description here ... as thresholds for ID1 when counting peaks

 Thresh <- function(X) {
      H_peaks_1a <- ifelse(X >= SD1_thresh ,1,0)
      H_peaks_2a <- ifelse(X >= SD2_thresh,1,0)
      H_peaks_3a <- ifelse(X >= SD3_thresh,1,0)
      H_neg_peaks_1a <- ifelse(X <= SD1_neg_thresh ,1,0)
      H_neg_peaks_2a <- ifelse(X <= SD2_neg_thresh ,1,0)
      H_neg_peaks_3a <- ifelse(X <= SD3_neg_thresh ,1,0)
      return(cbind(H_peaks_1a, H_peaks_2a, H_peaks_3a, H_neg_peaks_1a, H_neg_peaks_2a, H_neg_peaks_3a))
    }

Step 3: count peaks above thresholds

#count peaks
peaks <- t(sapply(H_peaks.ID, function(x) apply(x, 2, function(y) sum(diff(c(y, 0)) < 0))))
peaks <- as.data.frame(peaks)
peaks

Step 4: count time above thresholds

#total time / frames above threshhold
time <- t(sapply(H_peaks.ID, function(x) apply(x, 2, sum)))
time <- as.data.frame(time)
time

I am striving for the following final overview: enter image description here

gaut
  • 5,771
  • 1
  • 14
  • 45
Smuts94
  • 49
  • 7
  • pleae clarify what you're after – gaut Aug 07 '22 at 12:18
  • @gaut: as stated above, I want to determine thresholds to count DF$Happiness peaks by using the standard deviation of the Happiness values per respective person (DF$ID) as personalized thresholds per person (DF$ID) – Smuts94 Aug 09 '22 at 13:31
  • I think that an example of your expected result would go a long way. it is standard practice on stackoverflow – gaut Aug 09 '22 at 13:42
  • @gaut: done. There are multiple steps needed to be done to try and get to the final overview I'm after. I'm stuck on a step early in this analysis: using the results per ID of one function in the next function. – Smuts94 Aug 09 '22 at 14:41
  • good. I will take a look – gaut Aug 09 '22 at 14:59
  • If you'd like to ask other questions related to your next step, I'd suggest to do so in a separate question which url you can paste in these comments. Try to limit your problem to only the task at hand, eg describe only your last step, not all the above. Make a reprex from your current step, and state your expected output. Welcome to SO – gaut Aug 09 '22 at 23:14
  • So your Step 2 is not runnable as you post, correct? Why not forgo first function and use its _thresh calcs in second *before* `ifelse` calls? – Parfait Aug 09 '22 at 23:58

1 Answers1

0

Okay I believe that the below answers your question and explains a few ways to improve your code: creating better functions, and using the dplyr package.

Your functions should be doing the least amount of computation possible, in order to avoid repeating yourself. as per R for Data Science by Hadley for example, on Functional Programming.

To prevent bugs and to make more flexible code, adopt the “do not repeat yourself”, or DRY, principle.

In the below you'll notice that the Thresh() and TR_sd() functions now only do one vector at a time, and operate on vectors rather than full data.frames.

Thresh() now also accepts another argument (thresh) defining the threshold. This is good practice, instead of relying on already-declared variables in your global environment.

library(dplyr)
set.seed(9494)
Happiness <- round(runif(100, -100, 100))
ID <- rep(c("ID1", "ID2", "ID3", "ID4", "ID5"), 20)
Stimuli <- rep(1:4, 1)
DF <- data.frame(ID, Stimuli, Happiness)

#split dataframe per ID AND stimuli
DF.id <- split(DF, DF$ID)

# 1. better functions for thresholds --------------------------------------
TR_SD <- function(y) {
  SD_thresh <- mean(y) + (1*sd(y))
  SD_thresh # in R, no need to explicitly use return(xx), just xx at the end of the function will do the job
}

# base syntax -------------------------------------------------------------

SD.Thresh <- sapply(1:length(DF.id), function(x) TR_SD(DF.id[[x]]$Happiness) )
thresh_df <- data.frame(ID=c("ID1", "ID2", "ID3", "ID4", "ID5"), SD.Thresh)
thresh_df
#>    ID SD.Thresh
#> 1 ID1  63.13333
#> 2 ID2  91.21786
#> 3 ID3  30.48874
#> 4 ID4  29.75186
#> 5 ID5  39.33467

# dplyr syntax ------------------------------------------------------------
thresh_df_dplyr <- DF %>% group_by(ID) %>% summarise(SD.Thresh = TR_SD(Happiness))
identical(as.data.frame(thresh_df_dplyr), thresh_df) # shows that these are the same
#> [1] TRUE

# 2: question: how to use the thresholds to count the peaks ---------------------------------------------------------

# base --------------------------------------------------------------------
Thresh <- function(X, thresh) {
  H_peaks <- ifelse(X >= thresh ,1,0)
  return(H_peaks) # in R, no need to explicitly use return(xx), just xx at the end of the function will do the job
}
H_peaks.list <- lapply(1:length(DF.id), function(i) Thresh(DF.id[[i]]$Happiness, thresh = thresh_df$SD.Thresh[i]))
names(H_peaks.list) <- unique(ID)
H_peaks_df <- as.data.frame(H_peaks.list)
head(H_peaks_df,3)
#>   ID1 ID2 ID3 ID4 ID5
#> 1   0   0   0   1   1
#> 2   0   0   0   1   0
#> 3   0   0   0   0   0
sapply(H_peaks_df, sum)
# ID1 ID2 ID3 ID4 ID5 
#  4   1   4   5   5
Created on 2022-08-10 by the reprex package (v2.0.1)

Now, the power of the dplyr syntax is that you can do all of the above in just one single line of chained operations.

# dplyr -------------------------------------------------------------------

H_peaks_dplyr <- DF %>% group_by(ID) %>% summarise(SD.Thresh = TR_SD(Happiness), nb_peaks = sum(Happiness>SD.Thresh))
H_peaks_dplyr # in one line we're doing both steps
#> # A tibble: 5 x 3
#>   ID    SD.Thresh nb_peaks
#>   <chr>     <dbl>    <int>
#> 1 ID1        63.1        4
#> 2 ID2        91.2        1
#> 3 ID3        30.5        4
#> 4 ID4        29.8        5
#> 5 ID5        39.3        5
gaut
  • 5,771
  • 1
  • 14
  • 45
  • Thank you! I will practice until I understand this better! To clarify: if I want 6x thresholds... does it make more sense to duplicate the above 6 times or can one function include 6 thresholds? – Smuts94 Aug 10 '22 at 11:53
  • the function should be repeated. if my answer is satisfactory, dont hesitate to mark it as accepted – gaut Aug 10 '22 at 11:57
  • dear @gaut: please could you check the followup question here https://stackoverflow.com/questions/73330615/counting-peaks-in-r-per-groups-of-a-data-frame as I have stumbled across a hurdle in this code. Thanks! – Smuts94 Aug 12 '22 at 07:25