1

I am attempting to write a code that will automatically scramble the columns of multiple 96-well plates.

Here is a small example of the data:

plate <- data.frame(column = rep(c(rep("02", 4), rep("03", 4), rep("04", 4)), 2),
                    row = rep(c("A", "B", "C", "D"), 6),
                    plate_id = c(rep("Plate_1", 12), rep("Plate_2", 12)),
                    compound = 1:24) %>%
  mutate(well_id = paste0(row, column))

   column row plate_id compound well_id
1      02   A  Plate_1        1     A02
2      02   B  Plate_1        2     B02
3      02   C  Plate_1        3     C02
4      02   D  Plate_1        4     D02
5      03   A  Plate_1        5     A03
6      03   B  Plate_1        6     B03
7      03   C  Plate_1        7     C03
8      03   D  Plate_1        8     D03
9      04   A  Plate_1        9     A04
10     04   B  Plate_1       10     B04
11     04   C  Plate_1       11     C04
12     04   D  Plate_1       12     D04
13     02   A  Plate_2       13     A02
14     02   B  Plate_2       14     B02
15     02   C  Plate_2       15     C02
16     02   D  Plate_2       16     D02
17     03   A  Plate_2       17     A03
18     03   B  Plate_2       18     B03
19     03   C  Plate_2       19     C03
20     03   D  Plate_2       20     D03
21     04   A  Plate_2       21     A04
22     04   B  Plate_2       22     B04
23     04   C  Plate_2       23     C04
24     04   D  Plate_2       24     D04

What I have done so far is:

all_col <- plate$column %>% unique()
col_list <- rep(list(all_col), plate$plate_id %>% unique() %>% length())
set.seed(2248)
random_col_list <- lapply(col_list, function(x) sample(x))
names(random_col_list) <- plate$plate_id %>% unique()

plate_randomized <- plate %>% #there has to be a better way...
  mutate(newcol = case_when(column == all_col[1] & plate_id == "Plate_1" ~ random_col_list$Plate_1[1],
                            column == all_col[2] & plate_id == "Plate_1" ~ random_col_list$Plate_1[2],
                            column == all_col[3] & plate_id == "Plate_1" ~ random_col_list$Plate_1[3],
                            column == all_col[1] & plate_id == "Plate_2" ~ random_col_list$Plate_2[1],
                            column == all_col[2] & plate_id == "Plate_2" ~ random_col_list$Plate_2[2],
                            column == all_col[3] & plate_id == "Plate_2" ~ random_col_list$Plate_2[3]),
         new_id = paste0(row, newcol))

   column row plate_id compound well_id newcol new_id
1      02   A  Plate_1        1     A02     03    A03
2      02   B  Plate_1        2     B02     03    B03
3      02   C  Plate_1        3     C02     03    C03
4      02   D  Plate_1        4     D02     03    D03
5      03   A  Plate_1        5     A03     02    A02
6      03   B  Plate_1        6     B03     02    B02
7      03   C  Plate_1        7     C03     02    C02
8      03   D  Plate_1        8     D03     02    D02
9      04   A  Plate_1        9     A04     04    A04
10     04   B  Plate_1       10     B04     04    B04
11     04   C  Plate_1       11     C04     04    C04
12     04   D  Plate_1       12     D04     04    D04
13     02   A  Plate_2       13     A02     04    A04
14     02   B  Plate_2       14     B02     04    B04
15     02   C  Plate_2       15     C02     04    C04
16     02   D  Plate_2       16     D02     04    D04
17     03   A  Plate_2       17     A03     02    A02
18     03   B  Plate_2       18     B03     02    B02
19     03   C  Plate_2       19     C03     02    C02
20     03   D  Plate_2       20     D03     02    D02
21     04   A  Plate_2       21     A04     03    A03
22     04   B  Plate_2       22     B04     03    B03
23     04   C  Plate_2       23     C04     03    C03
24     04   D  Plate_2       24     D04     03    D03

This gets me what I want, however ideally I would be able to "loop" over all_col and random_col_list indices and each plate_id as this is a large dataset.

This is a similar question to this question, but the answer doesn't explain how to use purr::map and !!, !!! so I don't know how to apply the answer to this problem.

tinyteeth
  • 199
  • 7
  • I think what you're looking for is ```reduce()``` from ```purrr```. ```list(df1,df2,df3)%>% reduce(fun,parm)``` – Jacky Dec 10 '19 at 22:11

2 Answers2

2

If you want to use purrr and dplyr, do this:

First write a function that does what you want:

newcol = function(all_col_list, col, plate, output){
    col_index = which(all_col_list %in% col)
    var = output[[plate]][col_index]
    return(as.character(var))
}

Then using mutate and map2:

plate_randomized <- plate %>%  #there is a better way...
  mutate(newcol = map2(column, plate_id, ~ newcol(all_col,.x, .y, 
random_col_list)),
     new_id = paste0(row, newcol))

   column row plate_id compound well_id newcol new_id
1      02   A  Plate_1        1     A02     02    A02
2      02   B  Plate_1        2     B02     02    B02
3      02   C  Plate_1        3     C02     02    C02
4      02   D  Plate_1        4     D02     02    D02
5      03   A  Plate_1        5     A03     04    A04
6      03   B  Plate_1        6     B03     04    B04
7      03   C  Plate_1        7     C03     04    C04
8      03   D  Plate_1        8     D03     04    D04
9      04   A  Plate_1        9     A04     03    A03
10     04   B  Plate_1       10     B04     03    B03
11     04   C  Plate_1       11     C04     03    C03
12     04   D  Plate_1       12     D04     03    D03
13     02   A  Plate_2       13     A02     04    A04
14     02   B  Plate_2       14     B02     04    B04
15     02   C  Plate_2       15     C02     04    C04
16     02   D  Plate_2       16     D02     04    D04
17     03   A  Plate_2       17     A03     03    A03
18     03   B  Plate_2       18     B03     03    B03
19     03   C  Plate_2       19     C03     03    C03
20     03   D  Plate_2       20     D03     03    D03
21     04   A  Plate_2       21     A04     02    A02
22     04   B  Plate_2       22     B04     02    B02
23     04   C  Plate_2       23     C04     02    C02
24     04   D  Plate_2       24     D04     02    D02
Kay
  • 2,057
  • 3
  • 20
  • 29
  • This is great and is exactly what I was hoping for! I think `map2` is what is confusing me. Where do the `.x` and `.y` come from? I'm assuming it's referring to the arguments of `map2`? – tinyteeth Dec 11 '19 at 18:28
  • 1
    @tinyteeth You're right. They refer to the arguments of `map2`. `map2` is specialized for 2 arguments. For multiple arguments, `pmap` is what you need. As the documentation [here](https://purrr.tidyverse.org/reference/map2.html) says: for one argument, use `.`, for 2 arguments , `.x` and `.y` and for multiple arguments, use `..1, ..2, ..3` etc. You can accept the answer if it gives you exactly what you want. – Kay Dec 11 '19 at 19:07
1

I guess the dplyr way would be to change the order of the levels of column, and re-sort the vectors. And we write a function to do this within a subset:

library(purrr)
library(tidyr)
library(dplyr)

swap_col=function(df){
df %>% 
mutate(newcol=factor(column,levels=sample(unique(column)))) %>%
mutate(newcol=as.character(sort(newcol))) %>%
mutate(newid=paste0(row,newcol))
}

We can test this on one subset:

swap_col(subset(plate,plate_id=="Plate_1"))

   column row plate_id compound well_id newcol newid
1      02   A  Plate_1        1     A02     02   A02
2      02   B  Plate_1        2     B02     02   B02
3      02   C  Plate_1        3     C02     02   C02
4      02   D  Plate_1        4     D02     02   D02
5      03   A  Plate_1        5     A03     04   A04
6      03   B  Plate_1        6     B03     04   B04
7      03   C  Plate_1        7     C03     04   C04
8      03   D  Plate_1        8     D03     04   D04
9      04   A  Plate_1        9     A04     03   A03
10     04   B  Plate_1       10     B04     03   B03
11     04   C  Plate_1       11     C04     03   C03
12     04   D  Plate_1       12     D04     03   D03

Now we use purrr to apply on each subset

plate %>% split(.$plate_id) %>% map_dfr(swap_col)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72