4

I am in the process in randomly assigning treatments for an experiment. I have four sites (Site1, ... Site4) where 12 experimental units (e.g., 1 ... 12) are replicated four times (e.g., 1 ... 4). For each replicate, I have randomly assigned one of three treatments (e.g., trt1 ...trt3).

I now need to assign a SecondTreatment (y or no) to each Treatment within a Replicate for each of my Sites. trt2 should always be "y", whereas I want to randomly assign "y" to half of trt1, and "n" to another half of trt1, and then do the same for trt3. The should give me for each Replicate: trt2 with 4 "y", trt1 with 2 "n", and trt3 with 3 "n".

My data looks like this:

Site      Experimental unit     Replicate   Treatment        SecondTreatmentAssign (y/n)
Site1              1                1         trt1  
Site1              2                1         trt2  
Site1              3                1         trt3  
Site1              4                1         trt3  
Site1              5                1         trt1  
Site1              6                1         trt2  
Site1              7                1         trt3  
Site1              8                1         trt2  
Site1              9                1         trt1  
Site1              10               1         trt1  
Site1              11               1         trt3  
Site1              12               1         trt2  
Site1              1                2         trt2  
Site1              2                2         trt3  
Site1              3                2         trt1  
Site1              4                2         trt2  
Site1              5                2         trt1  
Site1              6                2         trt3  
Site1              7                2         trt2  
Site1              8                2         trt2         
Site1              9                2         trt1  
Site1              10               2         trt2  
Site1              11               2         trt1  
Site1              12               2         trt3      
Site1              1                3         trt2  
Site1              2                3         trt1  
Site1              3                3         trt3  
Site1              4                3         trt3  
Site1              5                3         trt2  
Site1              6                3         trt1  
Site1              7                3         trt3  
Site1              8                3         trt2  
Site1              9                3         trt1  
Site1              10               3         trt1  
Site1              11               3         trt3  
Site1              12               3         trt2  
Site1              1                4         trt3  
Site1              2                4         trt2  
Site1              3                4         trt1  
Site1              4                4         trt3  
Site1              5                4         trt2  
Site1              6                4         trt1  
Site1              7                4         trt3  
Site1              8                4         trt1  
Site1              9                4         trt2  
Site1              10               4         trt1  
Site1              11               4         trt2  
Site1              12               4         trt3
  .                .                .           .
  .                .                .           .
  .                .                .           .
Site4              12               4         trt1  

I'd like to be able to do this in a way that makes these assignments back into the dataframe that way I don't have to manually move anything around. I am still quite the novice with programming and not sure how to make this happen.

Thanks!

1 Answers1

3

I would do this:

library(dplyr) 
data %>% 
  arrange(runif(n())) %>% # randomize the order
  group_by(Site, Replicate, Treatment) %>% # group
  mutate(
    Treat_2 = case_when(
      Treatment == "trt2" ~ "y", # trt2 gets 'y'
      row_number() <= n() / 2 ~ "y", # others in the first half get "y"
      TRUE ~ "n" # others in the second half get "n"
  )) %>%
  arrange(Site, Replicate, `Experimental unit`) ## return to original order
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I haven't had the time to test this, but looking at your response makes me wonder if some of my language in my question was imprecise. For trt 1 and trt 3, I want to assign "y" and "n" to half of the items in that trt group, not necessarily assigning something to the first half, then the second half. Does this code work in that manner (i.e., assign "y" to the first half of the trt group, then "n" to the second half of the trt group), or will it randomly assign "y" and "n" within the groups? – Todd D. Johnson May 04 '20 at 21:02
  • 1
    The `arrange(runif(n))` line completely randomizes the order. So the halves that get assigned are random because the order has been randomized. Then, as the comment says, the final `arrange()` gets it back to the original order. – Gregor Thomas May 05 '20 at 02:48
  • This code mostly works but it doesn't solve the problem exactly. All of trt2 are assigned "y" as expected, but the assignments of "y" and "n" trt1 and trt3 are not correct. Instead of 2 "n" and 2 "y" for each of trt1 and trt2, there is 1 "y" and 3 "n" for each. My best understanding of what is happening here is that out of the 12 total rows, only two "y" are being assigned to trt1 and trt3 per replicate because 4 "y" have already been assigned to trt2, making 4 + 2 = 6 "y", resulting in 6 "n" being assigned instead of "4" as anticipated. Any suggestion on what to change to resolve this? – Todd D. Johnson May 05 '20 at 17:53
  • 1
    It's a dumb mistake on my part--I used `row_number < n() / 2`, but I should have used `<=`. Edits made. The "already assigned" bit you mention isn't an issue because of the `group_by`--everything happens within the `Site Replicate Treatment` subgroups. – Gregor Thomas May 05 '20 at 18:08
  • No worries, I struggle with syntax and I'm still a beginner with dplyer so I wouldn't have known otherwise! That resolved that issue and I have accepted your answer. – Todd D. Johnson May 05 '20 at 18:52