I have a data frame which is similar to the output result in this answered questionabout aggregate data by group
but I would like to create a column for new IDs based on the count of unique cluster in DF
DF = read.table(text="name cluster count
F00851.3 20 2
F00851.2 20 2
F00851 20 2
F00851.8 20 2
F00851.4 20 2
F00851.5 20 2
F00851.1 20 2
F00851.6 21 2
F00851.7 21 2
F00958.2 23 1
F00958.1 23 1
F00958.3 23 1
F00958 23 1
F01404.5 28 3
F01404 28 3
F01404.4 28 3
F01404.3 28 3
F01404.6 29 3
F01404.1 29 3
F01404.7 29 3
F01404.2 30 3
F01404.8 30 3", header=T, stringsAsFactors=F)
The expected result:
result = read.table(text="name cluster count ID
F00851.3 20 2 F00851.1
F00851.2 20 2 F00851.1
F00851 20 2 F00851.1
F00851.8 20 2 F00851.1
F00851.4 20 2 F00851.1
F00851.5 20 2 F00851.1
F00851.1 20 2 F00851.1
F00851.6 21 2 F00851.2
F00851.7 21 2 F00851.2
F00958.2 23 1 F00958.1
F00958.1 23 1 F00958.1
F00958.3 23 1 F00958.1
F00958 23 1 F00958.1
F01404.5 28 3 F01404.1
F01404 28 3 F01404.1
F01404.4 28 3 F01404.1
F01404.3 28 3 F01404.1
F01404.6 29 3 F01404.2
F01404.1 29 3 F01404.2
F01404.7 29 3 F01404.2
F01404.2 30 3 F01404.3
F01404.8 30 3 F01404.3", header=T, stringsAsFactors=F)
In my case, the group is substr(DF$name,1,6)
. So the new column ID should be substr(DF$name,1,6)
plus extension separated by dot. the extension number is the series number of unique values in cluster
column for each group.
Appreciate any helps.