3

I would like to count the occurrences of previous occurred values using library(dplyr).

Example data:

dates <- as.Date(as.character(c("2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-17",
                                    "2011-01-17",
                                    "2011-01-18",
                                    "2011-01-18")))

    ID <-c("1","2","3","3","1","5","6","5","7","8","1","2","11","2",'12',"5","5","1","4")
    # put together
    data <- data.frame(dates,ID)
    data

        dates     ID
    1  2011-01-13  1
    2  2011-01-14  2
    3  2011-01-15  3
    4  2011-01-16  3
    5  2011-01-17  1
    6  2011-01-13  5
    7  2011-01-14  6
    8  2011-01-15  5
    9  2011-01-16  7
    10 2011-01-17  8
    11 2011-01-13  1
    12 2011-01-14  2
    13 2011-01-15 11
    14 2011-01-16  2
    15 2011-01-17 12
    16 2011-01-17  5
    17 2011-01-17  5
    18 2011-01-18  1
    19 2011-01-18  4

I would like to construct a dataset which looks like:

          dates    ID       prev_occurene
    1  2011-01-13  1             1
    2  2011-01-14  2             1
    3  2011-01-15  3             1
    4  2011-01-16  3             2
    5  2011-01-17  1             2
    6  2011-01-13  5             1
    7  2011-01-14  6             1
    8  2011-01-15  5             2
    9  2011-01-16  7             1
    10 2011-01-17  8             1
    11 2011-01-13  1             3
    12 2011-01-14  2             2
    13 2011-01-15 11             1
    14 2011-01-16  2             3
    15 2011-01-17 12             1
    16 2011-01-17  5             3
    17 2011-01-17  5             4
    18 2011-01-18  1             4
    19 2011-01-18  4             1

where I add 1 to an ID if it has occurred in the past.

So far I have tried to solve that using duplicates. However the output doesnt look very promising:

library(dplyr)

data_dups <- data %>% 
  group_by(dates) %>% 
  mutate(dups = duplicated(ID)) %>%
  filter(dups == 'TRUE') %>% 
  summarise(occurence = n())

            dates occurence

        <date>           <int>
      1 2011-01-13         1
      2 2011-01-14         1
      3 2011-01-17         1
Frank
  • 66,179
  • 8
  • 96
  • 180
Mamba
  • 1,183
  • 2
  • 13
  • 33
  • 2
    `ave(seq_along(data$ID), data$ID, FUN = seq_along)` – d.b Aug 21 '17 at 14:32
  • Im sorry, I just realized I make a mistake regadring the order of the time series. Just doing the edit – Mamba Aug 21 '17 at 14:33
  • @d.b that is exactly what the output should look like. thank you! is it possible to integrate that into a `dplyr` pipe and in a mutate statement for instance? – Mamba Aug 21 '17 at 14:35

2 Answers2

3

In dplyr you can use row_number() to count occurrences within groups.

library(tidyverse)
data %>% 
  arrange(dates) %>% 
  group_by(ID) %>% 
  mutate(occurrence = row_number())

# A tibble: 19 x 3
# Groups:   ID [10]
#          dates     ID occurrence
#         <date> <fctr>      <int>
#  1 2011-01-13      1          1
#  2 2011-01-14      2          1
#  3 2011-01-15      3          1
#  4 2011-01-16      3          2
#  5 2011-01-17      1          2
#  6 2011-01-13      5          1
#  7 2011-01-14      6          1
#  8 2011-01-15      5          2
#  9 2011-01-16      7          1
# 10 2011-01-17      8          1
# 11 2011-01-13      1          3
# 12 2011-01-14      2          2
# 13 2011-01-15     11          1
# 14 2011-01-16      2          3
# 15 2011-01-17     12          1
# 16 2011-01-17      5          3
# 17 2011-01-17      5          4
# 18 2011-01-18      1          4
# 19 2011-01-18      4          1

Note that this solution relies on data ordered by dates. Thus, arrange(dates) is added.

loki
  • 9,816
  • 7
  • 56
  • 82
0

Try this by using dplyr::row_number()

data %>% group_by(dates) %>% mutate(occurrence = row_number())
Prradep
  • 5,506
  • 5
  • 43
  • 84