0

If anyone can help; thanks. Say I had a data frame of players answers to quiz questions regarding a certain area (countries in the table): enter image description here

The 0 means a wrong answer, the 1 means a right answer. Also, lets say I had 40 country questions and 100 players. How would I loop and change certain scores for each player to a string like "disc" when they had answered a certain number of questions incorrectly?

That is, when a player has answered say 3/4 questions incorrectly on the countries category within the first 4 questions (as player one has), how would I then change each of his subsequent answers to "disc" (i.e., they would not count regardless of a correct or incorrect answer as 3 out of the first 4 questions were incorrect?) I would like to do this over the whole DF for each player in the specific "countries" category (so not the flag category).

1 Answers1

1

Here is an example on how you can achieve this, in the future it would be best to attach a reproducible example.

Data


set.seed(34)
df1 = data.frame(player=c(1:10), replicate(9,sample(0:1,10,rep=TRUE)), Flag = sample(0:1,10,rep=TRUE))
names(df1)[2:10] = paste0("Countries", names(df1)[2:10])
> df1
   player CountriesX1 CountriesX2 CountriesX3 CountriesX4 CountriesX5 CountriesX6 CountriesX7 CountriesX8 CountriesX9 Flag
1       1           0           0           0           0           1           1           1           1           0    0
2       2           0           1           0           0           1           0           1           0           1    0
3       3           0           1           1           0           1           0           1           0           1    1
4       4           1           1           0           0           0           1           1           1           0    0
5       5           1           1           1           1           1           1           1           0           1    1
6       6           1           1           0           0           1           1           1           0           1    1
7       7           1           0           1           0           1           0           0           0           1    1
8       8           1           0           0           1           0           0           1           0           0    0
9       9           1           0           1           0           1           1           0           0           0    0
10     10           0           0           0           1           1           0           1           0           1    1

Code


A option to transform the data is to create a intermediate data frame that is the cumulative sum of the 'fail':

df2 = t(df1) %>% #transpose df1
  row_to_names(row_number = 1) %>% #names the columns with row 1
  as.data.frame() %>% # transform as data frame
  filter(grepl("Countries",rownames(.)))%>% # filter to have only "Countries" rows
  mutate(across(everything(), ~ifelse(.x==1, 0,1))) %>% #Invert 0 and 1
  mutate(across(everything(), ~cumsum(.x))) %>% #Calculate cumulative sum
  t() #transpose the new data frame

> df2
   CountriesX1 CountriesX2 CountriesX3 CountriesX4 CountriesX5 CountriesX6 CountriesX7 CountriesX8 CountriesX9
1            1           2           3           4           4           4           4           4           5
2            1           1           2           3           3           4           4           5           5
3            1           1           1           2           2           3           3           4           4
4            0           0           1           2           3           3           3           3           4
5            0           0           0           0           0           0           0           1           1
6            0           0           1           2           2           2           2           3           3
7            0           1           1           2           2           3           4           5           5
8            0           1           2           2           3           4           4           5           6
9            0           1           1           2           2           2           3           4           5
10           1           2           3           3           3           4           4           5           5

Then the initial data frame (df1) can be filtered by the intermediate data frame (df2). For the columns of interest, here "Countries", a threshold can be made to change values to "Dis" if more that 2 errors were made:

df1[2:10][df2>2]="Dis" 

> df1
   player CountriesX1 CountriesX2 CountriesX3 CountriesX4 CountriesX5 CountriesX6 CountriesX7 CountriesX8 CountriesX9 Flag
1       1           0           0         Dis         Dis         Dis         Dis         Dis         Dis         Dis    0
2       2           0           1           0         Dis         Dis         Dis         Dis         Dis         Dis    0
3       3           0           1           1           0           1         Dis         Dis         Dis         Dis    1
4       4           1           1           0           0         Dis         Dis         Dis         Dis         Dis    0
5       5           1           1           1           1           1           1           1           0           1    1
6       6           1           1           0           0           1           1           1         Dis         Dis    1
7       7           1           0           1           0           1         Dis         Dis         Dis         Dis    1
8       8           1           0           0           1         Dis         Dis         Dis         Dis         Dis    0
9       9           1           0           1           0           1           1         Dis         Dis         Dis    0
10     10           0           0         Dis         Dis         Dis         Dis         Dis         Dis         Dis    1
Bushidov
  • 713
  • 4
  • 16
  • Thank you for the suggestion and the code annotations also. I'm going to try implement this for a professional task next week so maybe need to tweak it slightly, but it certainly answers my question. Thanks again! – user15950867 Sep 09 '22 at 19:59
  • Could it also be modified to check the criteria for the first 4 questions? i.e., if the player answers more than 2 incorrect but only on the first four questions then all the rest are "disc"? So the player can make errors after the first four and still carry (no "disc") only if they don't make more than two errors on the first four. As when I try df[1:4][df1>3]="Dis" I get an error: Error in `[<-.data.frame`(`*tmp*`, df1 > 3, value = "Dis") : unsupported matrix index in replacement – user15950867 Sep 12 '22 at 21:04