1

I have a dataframe that I'd like to set up for a pivot wider (will explain end-goal at end of main question, in case I'm maybe going about this in a convoluted way and someone can point out a smarter solution). For that, I'd like to add a column to my dataframe that increments every time a value repeats within a column. For example, if I had the following dataframe:

my_df1 <- data.frame(col1 = c("A","D","B","E","A","G","G","E","G"),
                    col2 = c(1, 3, 5, 4, 3, 1, 1, 5, 5))
my_df1
  col1 col2
1    A    1
2    D    3
3    B    5
4    E    4
5    A    3
6    G    1
7    G    1
8    E    5
9    G    5

then I would like to add a third column that increments every time a value "repeats" in col1, indicating the number of times a given value has occurred so far in col1. My desired dataframe would be as follows:

my_df2
  col1 col2 col3
1    A    1    1
2    D    3    1
3    B    5    1
4    E    4    1
5    A    3    2
6    G    1    1
7    G    1    2
8    E    5    2
9    G    5    3

So the first time a value appears in col1, the increment starts at 1. And, for example, when "A" is seen for the second time in the column (in row5), col3 increments to 2. To my knowledge, this is similar to the row_number function in SQL. Would someone be able to advise what is the best way to increment like this?

The point of this exercise is to setup a dataframe at my workplace for a pivot_wider. With the desired dataframe, my code and output would be something as follows:

my_df2_wide <- pivot_wider(data = my_df2,
                           names_from = col3,
                           names_prefix = "occurrence_",
                           values_from = col2)

my_df2_wide
# A tibble: 6 x 4
  col1  occurrence_1 occurrence_2 occurrence_3
  <chr>        <dbl>        <dbl>        <dbl>
1 A                1            3           NA
2 D                3           NA           NA
3 B                5           NA           NA
4 E                4           NA           NA
5 G                1            1            5
6 C               NA            5           NA
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
cstrudels
  • 17
  • 3

1 Answers1

3

Does this work:

library(dplyr)
my_df1 %>% group_by(col1) %>% mutate(col3 = row_number()) %>% ungroup()
# A tibble: 9 x 3
# Groups:   col1 [5]
  col1   col2  col3
  <chr> <dbl> <int>
1 A         1     1
2 D         3     1
3 B         5     1
4 E         4     1
5 A         3     2
6 G         1     1
7 G         1     2
8 E         5     2
9 G         5     3
Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • 1
    possibly an `ungroup()` at the end would be prudent if further analysis is going to be performed using pipes? – Miff Oct 21 '21 at 11:31