0

I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far

test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE)

This takes staples OF 16 from each group.

This is an example of what I would like the final product to look like (with two clusters),all I really want is for the same group id to belong to the same cluster as a set number of clusters

________________________________
balance   | group_id |  cluster|
454452    | a        |  1      |
5450441   | a        |  1      |
5444531   | b        |  1      |
5404051   | b        |  1      |
5404501   | b        |  1      |
5404041   | b        |  1      |
544251    | b        |  1      |
254252    | b        |  1      |
541254    | c        |  2      |
54123254  | d        |  1      |
542541    | d        |  1      |
5442341   | e        |  2      |
541       | f        |  1      |
________________________________
Dominic Naimool
  • 313
  • 2
  • 11
  • Is this your expected output or input? – akrun Dec 12 '19 at 14:54
  • This is my expected output – Dominic Naimool Dec 12 '19 at 15:04
  • 2
    ok, sorry, without a input example, it is difficult to test – akrun Dec 12 '19 at 15:05
  • The very best would be to provide input, actual output, and expected output. – Dan Chaltiel Dec 12 '19 at 15:07
  • The input would be the data provided without the column 'Cluster' (sorry for the confusion) – Dominic Naimool Dec 12 '19 at 15:13
  • We really need a reproducible example to help you. If your dataset is not too big, please write the output of `dput(data_extract)`. Else, select relevant columns and lines before (`dplyr::select` and `dplyr::filter`). – Dan Chaltiel Dec 12 '19 at 15:22
  • This is the problem, my actual data contains a couple million rows and each group contains from a few hundred to a few thousand – Dominic Naimool Dec 12 '19 at 15:33
  • I suggest you to read this https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example , in order to make a reproducible example – K Y Dec 12 '19 at 15:36
  • `data_extract %>% select(group_id, balance, clusterOrAnotherColumn) %>% head(20) %>% dput`. If the head does not contain enough groups, select some lines that do. Give us whatever we can use as input to test some answers. – Dan Chaltiel Dec 12 '19 at 15:46

1 Answers1

0

test<-data%>%group_by(group_id)%>% mutate(group = sample(1:16,1))

Dominic Naimool
  • 313
  • 2
  • 11
  • How is this question different from https://stackoverflow.com/questions/59310398/how-to-assign-a-number-between-1-and-n-in-r-to-rows/59310651#59310651 – Annet Dec 12 '19 at 18:49
  • Apparently it was not, I think describing the issue in terms of groups ended up being confusing, when in reality it does not matter because group_by ends up treating groups as if they are the same row – Dominic Naimool Dec 12 '19 at 19:17