0

My data looks like this:

DF <- structure(list(Gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE"), 
                 region = c("1:5914103-1:7245590","1:27403851-1:30161281","1:27403851-1:30161281","1:27403851-1:30161281","1:34800556-1:37548572")), 
                 .Names = c("Gene","region"), 
                 row.names = c(NA, 5L), 
                 class = "data.frame")

> DF
Gene                region
GeneA   1:5914103-1:7245590
GeneB 1:27403851-1:30161281
GeneC 1:27403851-1:30161281
GeneD 1:27403851-1:30161281
GeneE 1:34800556-1:37548572

I am wanting to create a new column (clump) in my datafame (DF) which summarizes another column (region) by cumulatively counting the groups in that column (region), so that it would look like this:

> DF
Gene                region    clump
GeneA   1:5914103-1:7245590    1
GeneB 1:27403851-1:30161281    2
GeneC 1:27403851-1:30161281    2
GeneD 1:27403851-1:30161281    2
GeneE 1:34800556-1:37548572    3

As this seemed like a fairly intuitive question, I have had a prolonged trawl through stackoverflow in search of an existing answer, and have seen similar questions, but they have lacked the component about cumulatively counting (i.e. other questions have asked about counting the number of rows in groups or unique instances of other columns within groups, rather than just reclassifying the group in a cumulative manner). So I apologize in advance if there is in fact a duplicate of this question out there.

Thanks for any help!

Lynsey
  • 339
  • 1
  • 2
  • 11
  • 2
    `match(DF$region, unique(DF$region))` – Ronak Shah Jun 09 '20 at 12:24
  • Well THIS works! What on earth?! I've been stabbing away at dplyr for about 90mins trying to figure this out, yet the working solution is so beautifully simple! Thank you so much @Ronak – Lynsey Jun 09 '20 at 12:27

0 Answers0