I have a dataframe that I'd like to set up for a pivot wider (will explain end-goal at end of main question, in case I'm maybe going about this in a convoluted way and someone can point out a smarter solution). For that, I'd like to add a column to my dataframe that increments every time a value repeats within a column. For example, if I had the following dataframe:
my_df1 <- data.frame(col1 = c("A","D","B","E","A","G","G","E","G"),
col2 = c(1, 3, 5, 4, 3, 1, 1, 5, 5))
my_df1
col1 col2
1 A 1
2 D 3
3 B 5
4 E 4
5 A 3
6 G 1
7 G 1
8 E 5
9 G 5
then I would like to add a third column that increments every time a value "repeats" in col1, indicating the number of times a given value has occurred so far in col1. My desired dataframe would be as follows:
my_df2
col1 col2 col3
1 A 1 1
2 D 3 1
3 B 5 1
4 E 4 1
5 A 3 2
6 G 1 1
7 G 1 2
8 E 5 2
9 G 5 3
So the first time a value appears in col1, the increment starts at 1. And, for example, when "A" is seen for the second time in the column (in row5), col3 increments to 2. To my knowledge, this is similar to the row_number function in SQL. Would someone be able to advise what is the best way to increment like this?
The point of this exercise is to setup a dataframe at my workplace for a pivot_wider. With the desired dataframe, my code and output would be something as follows:
my_df2_wide <- pivot_wider(data = my_df2,
names_from = col3,
names_prefix = "occurrence_",
values_from = col2)
my_df2_wide
# A tibble: 6 x 4
col1 occurrence_1 occurrence_2 occurrence_3
<chr> <dbl> <dbl> <dbl>
1 A 1 3 NA
2 D 3 NA NA
3 B 5 NA NA
4 E 4 NA NA
5 G 1 1 5
6 C NA 5 NA