I have a dataframe with two columns.
The first column contains unique names for clusters of samples (from a network) - one row per unique cluster name.
The second column contains the sample names that are members of each cluster, separated by a comma within the column. Clusters have differing numbers of samples.
I would like to have a single row for each sample name, with its unique cluster name in the column beside it. I have played with the melt()
function, but not gotten what I need.
Here's what I have:
clusterNo <-c("cluster1", "cluster2", "cluster3")
membership <-c("sample1, sample2, sample3", "sample4, sample5", "sample6, sample7, sample8, sample9")
df <-data.frame(cbind(clusterNo, membership)); df
clusterNo membership
1 cluster1 sample1, sample2, sample3
2 cluster2 sample4, sample5
3 cluster3 sample6, sample7, sample8, sample9
Here's my destination:
clusterNo <-c("cluster1", "cluster1", "cluster1", "cluster2", "cluster2", "cluster3", "cluster3", "cluster3", "cluster3")
membership <-c("sample1", "sample2", "sample3", "sample4", "sample5", "sample6", "sample7", "sample8", "sample9")
df2 <-data.frame(cbind(clusterNo, membership)); df2
clusterNo membership
1 cluster1 sample1
2 cluster1 sample2
3 cluster1 sample3
4 cluster2 sample4
5 cluster2 sample5
6 cluster3 sample6
7 cluster3 sample7
8 cluster3 sample8
9 cluster3 sample9
Thanks for your thoughts.