Problem
How to create a index column in a dataframe for R given the categorical values in a column?
In other words, assume we have a dataframe as follows:
id cat
1 A
2 A
3 A
4 B
5 B
6 C
7 C
8 C
9 C
10 C
How can we create a column called rank that does the following:
id cat rank
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 1
7 C 2
8 C 3
9 C 4
10 C 5
Attempts
Assume the dataframe is called df
. I tried the following:
- aggregate(df, by = c('A','B','C'), length)
- Starting writing a custom function to work with lapply, but ran into too many boundary cases.
This gave me mismatched length errors. Obviously the idea here was to get the count for each group and then write a function that can take the row value with lapply and keep counting until I hit the length.
Additional Notes
I am thinking about abandoning the above ideas and splitting the dataframe into separate smaller dataframes by cat value. I will then create a rank variable for each dataframe that is indexed. The challenge then becomes, what is a good way to combine all the dataframe values back into one dataframe with the new rank column?
None of this is sitting quite right with me though. Frankly, my gut says I'm doing this all wrong. Am I making this too hard? Is there a package or R trick that does this easily? I apologize if this seems silly, but I cannot in good conscious proceed further without seeking the advice of R programmers more skilled than me.