0

I have a data frame consisting of an ID column a clones column and 'Isolate' column.

Each ID is present multiple times within the ID column and are associated with different clones in the clone column named as clone 1, clone 2 clone 3 etc which come from distinct isolates. Each ID may have the same clone multiple times too

e.g.

ID  clones  Isolate
ID1 clone1    1
ID1 clone1    2 
ID1 clone1    3 
ID2 clone1    4
ID2 clone1    5
ID2 clone2    6
ID2 clone2    7
ID3 clone1    8
ID3 clone1    9
ID3 clone2    10
ID3 clone3    11
ID3 clone3    12

I want to select at random for each unique ID one representative of each clone.

I expect to get an output like this:

ID  clones   Isolate
ID1 clone1      2
ID2 clone1      5
ID2 clone2      6
ID3 clone1      8
ID3 clone2     10
ID3 clone3     12

with a representative clone for each ID chosen at random, so random isolate column

Phoebe
  • 3
  • 3
  • 1
    I'm not sure if I understand you correctly. Is `dplyr::distinct()` what you want? – yusuzech Jul 24 '19 at 19:48
  • jusst do a `unique(df1)` – akrun Jul 24 '19 at 19:48
  • So in my example ID3 in the ID column has two clone1, one clone 2, and two clone 3. I want a representative for each clone of each ID selected for at random, so I want to select one clone 1, one clone 2 and one clone 3 at random for ID3. I want to do this for every distinct ID – Phoebe Jul 24 '19 at 19:51
  • sorry I wasn't clear, my data frame has other columns as well which are all different in each row so using unique doesnt work, I will edit the question to show this – Phoebe Jul 24 '19 at 19:55

1 Answers1

0

It seems like you can use the results of a similar question asked just now: How to use R to identify twins, and then randomly select and remove one?

If you use dplyr's group_by function, for ID and clone, and sample_n(1) of those, you should get only one rep for each ID and clone pair. Borrowing from @Andrew Gustar's answer:

library(dplyr)

df %>% 
  group_by(ID, clones) %>% 
  sample_n(1)
Mike
  • 28
  • 4
  • If you think this is a duplicate of an existing question, it's better to flag it as such instead of adding a duplicate answer – camille Jul 24 '19 at 21:10