I have a large dataframe that I have subset to simplify my question, it looks like this:
genome_ID cluster
p1.A2 1
p1.A2 3
p1.A2 3
p1.A2 4
p1.A3 2
p1.A4 2
p1.A5 1
p1.A5 3
And I would like to add a column 'phages' to the dataframe and add numbers corresponding to how many times the genome_ID is present... ie..
genome_ID cluster phages
p1.A2 1 1
p1.A2 3 2
p1.A2 3 3
p1.A2 4 4
p1.A3 2 1
p1.A4 2 1
p1.A5 1 1
p1.A5 3 2
So as you can see the genome_ID p1.A2 is present four times, so there are now four different groupings in the column phages (1-4). p1.A5 is present twice, so there is now numbering from 1-2. If a genome_ID were present fifty times, I would like the column phages to number each from 1-50 (and the order of numbering doesn't matter)
I need to do this so I can subset my dataset more easily to map it to a phylogeny (a biological tree showing evolutionary relationships)
If someone could give me insight to useful R packages and methods that would be very helpful.