(A big thank you to all the comments so far, especially by dcarlson - it has helped me progress giant leaps)
UPDATE: I have refined my question on how to count peaks, with more visual backup to help understand and hopefully narrow down the missing syntax.
I am an R-beginner, usually doing all this analysis by hand in Excel... but I want to automate the approach in R.
Here is a simple screenshot to understand the dataset type.
I am using the following fake data (inspired by dcarlson's comment) on this platform to help make my questions more clear and will make it easier for you to help me:
set.seed(94)
Happiness <- round(runif(60, -100, 100))
ID <- rep(1:3, 20)
Stimuli <- rep(1:3, 1)
DF <- data.frame(ID, Stimuli, Happiness)
Dataframe "DF" is a summary of 3 people that each looked at 3 different images. Happiness is the emotion that they experienced while looking at the images for a certain period of time (in the dataframe each row is a different portion of 1 second)
My goal:
1 - count how many DF$Happiness "peaks" went over different thresholds (20/50/70) per DF$ID (per person) per DF$Stimuli (per stimuli).
2 - count the total time (s) that the emotion Happiness was above the respective threshold.
After this I want to summarize the number of peaks that went above the thresholds.
The same will also be for peaks below negative threshholds.
Step 1 (inspired by dcarlson's comment):
##split dataframe per respondent
DF.id <- split(DF, DF$ID)
My question: should I split according to Stimuli after this step and run the lapply() per Stimuli? My goal is to compare Happiness per Stimuli (DF$Stimuli) as an average across the people (DF$ID)
#determine positive thresholds
low_thresh <- 20
med_thresh <- 50
high_thresh <- 70
#determine negative thresholds
low_neg_thresh <- -20
med_neg_thresh <- -50
high_neg_thresh <- -70
#function to create matrix that analyzes Happiness based on threshholds
Thresh <- function(X) {
H_peaks_1a <- ifelse(X >= low_thresh ,1,0)
H_peaks_2a <- ifelse(X >= med_thresh ,1,0)
H_peaks_3a <- ifelse(X >= high_thresh ,1,0)
H_neg_peaks_1a <- ifelse(X <= low_neg_thresh ,1,0)
H_neg_peaks_2a <- ifelse(X <= med_neg_thresh ,1,0)
H_neg_peaks_3a <- ifelse(X <= high_neg_thresh ,1,0)
return(cbind(H_peaks_1a, H_peaks_2a, H_peaks_3a, H_neg_peaks_1a, H_neg_peaks_2a, H_neg_peaks_3a))
}
#run matrix
H_peaks.ID <- lapply(DF.id, function(id) Thresh(id$Happiness)) #Qestion: what does "function(id)" mean here?
H_peaks.ID
After this, I need to find a solution to:
1 - sum all the "1"-clusters to get the total "number of peaks" above threshholds.
2 - sum all the "1"s to get a total time above threshholds. (I am struggling to bring a matrix back into vector of dataframe)
Thankful for any tips & guidance!