0

I have a dataframe that has all exons and the gene the exon belongs to. The current exon names are not indicative of their order. I have ordered them based on the starting genomic position, so now I just need to generate a column that gives them an order number according to the gene.

Example of that top of the dataframe:

GENE EXON

GENE1, "789",
GENE1, "953",
GENE1, "102",
GENE2, "43024",
GENE3, "542",
GENE3, "047",

So this is what I want my data frame to look like:

GENE EXON genomic order

GENE1, "789", 1

GENE1, "953", 2

GENE1, "102", 3

GENE2, "43024", 1

GENE3, "542", 1

GENE3, "047", 2

How do I make a column that orders rows sequentially based on another vector's identity.

2 Answers2

1

You can try the code below with ave

transform(df, Order = ave(1:nrow(df),GENE,FUN = seq_along))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

In data.table

library(data.table)
setDT(df)[, Order :=  rowid(GENE)]
akrun
  • 874,273
  • 37
  • 540
  • 662