0

I have a dataset like this, where Count is the sequence of 1s for each participant:

Participant TrialNumber Correct    Count
     118       1          1           1
     118       2          1           2
     118       3          1           3
     118       4          1           4
     118       5          1           5
     120       1          1           1
     120       2          0           0
     120       3          0           0
     120       4          1           1
     120       5          1           2
     121       1          1           1
     121       2          1           2
     121       3          1           3
     121       4          1           4
     121       5          0           0

I need to find all trial numbers where the count is 4, and if a participant doesn't have a count of 4 (i.e. participant 120) to write 0 instead.

This is the code I have so far:

df<-df[with(df, !(Count> 4)),] 
df$Performance<- ifelse(df$Count==4, df$TrialNumber, NA)
df_Performance<-aggregate(Performance~Participant, data=df, first)

The only problem with this, is that I lose any participants who don't have a count of 4.

I'm thinking I need to add another ifelse statement to the second line of code saying if there are no 4s for that participant then write 0 for the first trial, then for any other trials write NA.

I'm stuck on how to write this next part of the statement, any suggestions on how to do this would be much appreciated, thank you

  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data AND your expected outcome, e.g. with `dput()` and use the [reprex-package](https://reprex.tidyverse.org/). – mnist Sep 16 '21 at 17:53

4 Answers4

1

Using base R aggregate with if/else.

aggregate(Count~Participant, df, function(x) if(any(x == 4)) 4 else 0)

#  Participant Count
#1         118     4
#2         120     0
#3         121     4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0
library(tidyverse)
data <-
  tibble::tribble(
    ~Participant, ~TrialNumber, ~Correct, ~Count,
    118, 1, 1, 1,
    118, 2, 1, 2,
    118, 3, 1, 3,
    118, 4, 1, 4,
    118, 5, 1, 5,
    120, 1, 1, 1,
    120, 2, 0, 0,
    120, 3, 0, 0,
    120, 4, 1, 1,
    120, 5, 1, 2,
    121, 1, 1, 1,
    121, 2, 1, 2,
    121, 3, 1, 3,
    121, 4, 1, 4,
    121, 5, 0, 0
  )

data %>%
  group_by(Participant) %>%
  mutate(
    has_four = 4 %in% Count
  ) %>%
  distinct(has_four) %>%
  transmute(
    Participant,
    performance = ifelse(has_four, 4, 0)
  )
#> # A tibble: 3 × 2
#> # Groups:   Participant [3]
#>   Participant performance
#>         <dbl>       <dbl>
#> 1         118           4
#> 2         120           0
#> 3         121           4

Created on 2021-09-16 by the reprex package (v2.0.1)

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • Hi @danlooo thanks for giving the dataset in the correct format will do this next time, the code I have creates a new data frame with a variable called performance, but I need this variable performance to contain a 0 for any participants who don't have a count of 4 (so 120 should have 0 for performance), I've edited the code to fit the dataset in your answer: ```data<-data[with(data, !(Count> 4)),] data$Performance<- ifelse(data$Count==4, data$TrialNumber, NA) data_Performance<-aggregate(Performance~Participant, data=data, first)``` – r.rodrigues18 Sep 16 '21 at 18:17
  • I updated my answer accordingly. – danlooo Sep 16 '21 at 18:21
  • Hi @danlooo, thank you very much! I also wanted it to take the TrialNumber where it had a count of 4 and to write 0 for the first trial where a participant didn't have a count of 4. With the help of your code I've edited it to make it work: ```data1<-data %>% group_by(Participant) %>% mutate( Performance = ifelse(Count==4, TrialNumber, ifelse(!any(Count==4) & TrialNumber==1, 0, NA)) ) %>% ungroup data_performance<-aggregate(Performance~Participant, data=data1, first)``` – r.rodrigues18 Sep 16 '21 at 18:57
0

Here is another tidyverse approach. After grouping by Participant, use summarise and designate the new column to include the TrialNumber where the Count is 4. If none of the trials have a 4 (using any), then include 0. This would assume you have only one trial of Count equaling 4 for any participant.

library(tidyverse)

df %>%
  group_by(Participant) %>%
  summarise(Performance = ifelse(any(Count == 4), TrialNumber[Count == 4], 0))

Output

  Participant Performance
        <int>       <dbl>
1         118           4
2         120           0
3         121           4

Data

df <- structure(list(Participant = c(118L, 118L, 118L, 118L, 118L, 
120L, 120L, 120L, 120L, 120L, 121L, 121L, 121L, 121L, 121L), 
    TrialNumber = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
    2L, 3L, 4L, 5L), Correct = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 
    0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L), Count = c(1L, 2L, 3L, 4L, 
    5L, 1L, 0L, 0L, 1L, 2L, 1L, 2L, 3L, 4L, 0L)), class = "data.frame", row.names = c(NA, 
-15L))
Ben
  • 28,684
  • 5
  • 23
  • 45
0

Here's a base R approach.

data$Performance <- ifelse(data$Count == 4, data$TrialNumber, 0)
aggregate(x = data$Performance, by = list(Participant = data$Participant), FUN = sum)

#>   Participant x
#> 1         118 4
#> 2         120 0
#> 3         121 4
Skaqqs
  • 4,010
  • 1
  • 7
  • 21