0

I have a dataframe with two columns.

The first column contains unique names for clusters of samples (from a network) - one row per unique cluster name.

The second column contains the sample names that are members of each cluster, separated by a comma within the column. Clusters have differing numbers of samples.

I would like to have a single row for each sample name, with its unique cluster name in the column beside it. I have played with the melt() function, but not gotten what I need.

Here's what I have:

    clusterNo <-c("cluster1", "cluster2", "cluster3")
    membership <-c("sample1, sample2, sample3", "sample4, sample5", "sample6, sample7, sample8, sample9")
    df <-data.frame(cbind(clusterNo, membership)); df


      clusterNo                         membership
    1  cluster1          sample1, sample2, sample3
    2  cluster2                   sample4, sample5
    3  cluster3 sample6, sample7, sample8, sample9

Here's my destination:

    clusterNo <-c("cluster1", "cluster1", "cluster1", "cluster2", "cluster2", "cluster3", "cluster3", "cluster3", "cluster3")
    membership <-c("sample1", "sample2", "sample3", "sample4", "sample5", "sample6", "sample7", "sample8", "sample9")
    df2 <-data.frame(cbind(clusterNo, membership)); df2


      clusterNo membership
    1  cluster1    sample1
    2  cluster1    sample2
    3  cluster1    sample3
    4  cluster2    sample4
    5  cluster2    sample5
    6  cluster3    sample6
    7  cluster3    sample7
    8  cluster3    sample8
    9  cluster3    sample9

Thanks for your thoughts.

ptenax
  • 141
  • 1
  • 14
  • Looks like you can use `df %>% tidyr::separate_longer_delim(membership, delim=", ")`. The `tidyr` package has a bunch of functions that supersede the old `cast/melt` functions – MrFlick Aug 10 '23 at 15:13
  • Thanks again @MrFlick. Looks like my question has answers, just not using the right keywords. – ptenax Aug 10 '23 at 15:14

0 Answers0