-1

(A big thank you to all the comments so far, especially by dcarlson - it has helped me progress giant leaps)

UPDATE: I have refined my question on how to count peaks, with more visual backup to help understand and hopefully narrow down the missing syntax.

I am an R-beginner, usually doing all this analysis by hand in Excel... but I want to automate the approach in R.

Here is a simple screenshot to understand the dataset type. enter image description here

I am using the following fake data (inspired by dcarlson's comment) on this platform to help make my questions more clear and will make it easier for you to help me:

set.seed(94)
Happiness <- round(runif(60, -100, 100))
ID <- rep(1:3, 20)
Stimuli <- rep(1:3, 1)
DF <- data.frame(ID, Stimuli, Happiness)

Dataframe "DF" is a summary of 3 people that each looked at 3 different images. Happiness is the emotion that they experienced while looking at the images for a certain period of time (in the dataframe each row is a different portion of 1 second)

My goal:

1 - count how many DF$Happiness "peaks" went over different thresholds (20/50/70) per DF$ID (per person) per DF$Stimuli (per stimuli).

2 - count the total time (s) that the emotion Happiness was above the respective threshold.

After this I want to summarize the number of peaks that went above the thresholds.

Goal Summary table 1: enter image description here

Goal Summary table 2: enter image description here

The same will also be for peaks below negative threshholds.

Step 1 (inspired by dcarlson's comment):

##split dataframe per respondent
DF.id <- split(DF, DF$ID)

My question: should I split according to Stimuli after this step and run the lapply() per Stimuli? My goal is to compare Happiness per Stimuli (DF$Stimuli) as an average across the people (DF$ID)

#determine positive thresholds
low_thresh <- 20
med_thresh <- 50
high_thresh <- 70

#determine negative thresholds
low_neg_thresh <- -20
med_neg_thresh <- -50
high_neg_thresh <- -70

#function to create matrix that analyzes Happiness based on threshholds
Thresh <- function(X) {
  H_peaks_1a <- ifelse(X >= low_thresh ,1,0)
  H_peaks_2a <- ifelse(X >= med_thresh ,1,0)
  H_peaks_3a <- ifelse(X >= high_thresh ,1,0)
  H_neg_peaks_1a <- ifelse(X <= low_neg_thresh ,1,0)
  H_neg_peaks_2a <- ifelse(X <= med_neg_thresh ,1,0)
  H_neg_peaks_3a <- ifelse(X <= high_neg_thresh ,1,0)
  return(cbind(H_peaks_1a, H_peaks_2a, H_peaks_3a, H_neg_peaks_1a, H_neg_peaks_2a, H_neg_peaks_3a))
}

#run matrix
H_peaks.ID <- lapply(DF.id, function(id) Thresh(id$Happiness)) #Qestion: what does "function(id)" mean here?
H_peaks.ID

After this, I need to find a solution to:

1 - sum all the "1"-clusters to get the total "number of peaks" above threshholds.

enter image description here

2 - sum all the "1"s to get a total time above threshholds. (I am struggling to bring a matrix back into vector of dataframe)

Thankful for any tips & guidance!

Smuts94
  • 49
  • 7
  • 1
    It would be extremely unlikely that anyone could really help without more information. It looks like you're new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from `dput(head(dataObject))` and any libraries you are using. If your data is proprietary, make some fake data with a similar structure. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Jul 24 '22 at 11:01
  • Thanks for feedback! Please review again, question was updated! – Smuts94 Jul 27 '22 at 18:21

2 Answers2

0

Creating your own data is not as difficult as it may seem. This made up data seems to represent your problem. If not, you can edit your question to provide more details and your own data:

set.seed(42)
Happiness <- round(runif(30, 0, 100))
ID <- rep(1:2, 15)
DFR <- data.frame(ID, Happiness)

DFR is a data frame with two columns, ID and Happiness. Now to analyze each ID separately we need to split the data frame:

DFR.ID <- split(DFR, DFR$ID)

DFR.ID is a list containing two data frames, one for each ID.

low_thresh <- 20
med_thresh <- 50
high_thresh <- 70
Thresh <- function(X) {
    V_peaks_1a <- ifelse(X >= low_thresh ,1,0)
    V_peaks_2a <- ifelse(X >= med_thresh ,1,0)
    V_peaks_3a <- ifelse(X >= high_thresh ,1,0)
    return(cbind(V_peaks_1a, V_peaks_2a, V_peaks_3a))
}

Now we create a function called Thresh to analyze Happiness and return a matrix with three columns, one for each threshold. Finally we use the function on each ID and produce a list containing a matrix for each ID showing the changes in Happiness:

V_peaks.ID <- lapply(DFR.ID, function(id) Thresh(id$Happiness))
V_peaks.ID
# $`1`
#       V_peaks_1a V_peaks_2a V_peaks_3a
#  [1,]          1          1          1
#  [2,]          1          0          0
#  [3,]          1          1          0
#  [4,]          1          1          1
#  [5,]          1          1          0
#  [6,]          1          0          0
#  [7,]          1          1          1
#  [8,]          1          0          0
#  [9,]          1          1          1
# [10,]          1          0          0
# [11,]          1          1          1
# [12,]          1          1          1
# [13,]          0          0          0
# [14,]          1          0          0
# [15,]          1          0          0
# 
# $`2`
#       V_peaks_1a V_peaks_2a V_peaks_3a
#  [1,]          1          1          1
#  [2,]          1          1          1
#  [3,]          1          1          0
#  [4,]          0          0          0
#  [5,]          1          1          1
#  [6,]          1          1          1
#  [7,]          1          0          0
#  [8,]          1          1          1
#  [9,]          0          0          0
# [10,]          1          1          0
# [11,]          0          0          0
# [12,]          1          1          1
# [13,]          1          1          0
# [14,]          1          1          1
# [15,]          1          1          1
dcarlson
  • 10,936
  • 2
  • 15
  • 18
0

I'm adding a separate answer to use the data you provided.

To get the amount of time we just sum the values in each column:

time <- t(sapply(H_peaks.ID, function(x) apply(x, 2, sum)))
time <- as.data.frame(time)
time
#   H_peaks_1a H_peaks_2a H_peaks_3a H_neg_peaks_1a H_neg_peaks_2a H_neg_peaks_3a
# 1          7          5          2             10              7              5
# 2          7          6          4              8              7              4
# 3          8          5          4              7              5              4

We use sapply to process each group and within each group use use apply to sum the columns.

To get the number of peaks is a bit more complicated:

peaks <- t(sapply(H_peaks.ID, function(x) apply(x, 2, function(y) sum(diff(c(y, 0)) < 0))))
peaks <- as.data.frame(peaks)
peaks
#   H_peaks_1a H_peaks_2a H_peaks_3a H_neg_peaks_1a H_neg_peaks_2a H_neg_peaks_3a
# 1          5          5          2              4              3              2
# 2          4          5          4              6              6              4
# 3          4          4          3              5              4              4

For the number of peaks we use diff to subtract each value from the preceding value. If the first value is 0 and the second is 1, the difference is -1, the start of a peak. We add a 0 at the end of each column to catch cases where the last value is 1.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • thank you! This has helped so much! I have 2x follow up questions: – Smuts94 Aug 02 '22 at 08:52
  • 1. I split the data: DF.id <- split(DF, f = list(vd$ID, vd$Stimuli)) The column names are now "ID.Stimuli". Is there a way to transform these new DFs (Peaks & Time) to summarize the average of the IDs in one Stimuli column? 2. Is it complicated to change the threshholds to standard deviations of each respondent? sd of DF$Happiness across all 3 DF$Stimuli per respondent? This would mean that each respondent has his own unique threshholds to count the peaks... Is it still possible with apply(), or is a for loop needed here? – Smuts94 Aug 02 '22 at 09:00
  • These questions are an expansion of your original question. You should start a new question and provide reproducible data. – dcarlson Aug 02 '22 at 16:49
  • please see new questions 1. https://stackoverflow.com/questions/73224162/count-peaks-in-r-followup 2. https://stackoverflow.com/questions/73248458/counting-peaks-in-r-using-personalised-sd-per-id-as-thresholds-to-count-peaks – Smuts94 Aug 08 '22 at 15:25