2

Let's say we have this data:

type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
mydataframe <- data.frame(type, dates)

I saw in other posts that rle might do the job but I want to obtain a dataframe where for each type, I have the mean persistance in days. Something like:

> print(persistance)
  type1 type2 type3
1     2   1.5   2.5

Anyone knows how to do this please? Thanks!

alistaire
  • 42,459
  • 4
  • 77
  • 117
Andrei Niță
  • 517
  • 1
  • 3
  • 14

3 Answers3

1

data.table

library(data.table)
runs <- setDT(rle(as.character(mydataframe$type)))
runs[, mean(lengths), values]

#    values  V1
# 1: type 1 2.0
# 2: type 2 1.5
# 3: type 3 2.5

tidyverse & magrittr

library(tidyverse)
library(magrittr)

rle(as.character(mydataframe$type)) %$% 
  tibble(lengths, values) %>% 
  group_by(values) %>% 
  summarise_all(mean)

# # A tibble: 3 x 2
#   values lengths
#   <chr>    <dbl>
# 1 type 1    2.00
# 2 type 2    1.50
# 3 type 3    2.50

dplyr

library(dplyr)
rle(as.character(mydataframe$type)) %>% 
  unclass %>%
  as.data.frame %>% 
  group_by(values) %>% 
  summarise_all(mean)
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

An alternative (grouping) solution:

type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
mydataframe <- data.frame(type, dates)

library(dplyr)

mydataframe %>%
  count(type, group = cumsum(type != lag(type, default = first(type)))) %>%
  group_by(type) %>%
  summarise(Avg = mean(n))

# # A tibble: 3 x 2
#     type     Avg
#    <fct>  <dbl>
# 1 type 1   2  
# 2 type 2   1.5
# 3 type 3   2.5
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
0

You can use base R functions rle and aggregate to do this.

# set up the data as in your question
type <- paste("type", c(1,1,1,2,3,1,2,2,3,3,3,3,1,1))
dates <- seq(as.Date("2000/1/1"), by = "days", length.out = length(type)) 
mydataframe <- data.frame(type, dates)

# calculate the length of the run using rle 
runs <- rle(as.character(mydataframe$type))
# calculate the average length of the run
aggregate(runs[[1]], by = runs[2], FUN = mean)

Please note that this assumes that the dates in your date column are indeed consecutive. If you had a gap in dates and wanted to treat that as separate runs, you would have to change the formulas a bit to really work with the dates in the dates column.

ira
  • 2,542
  • 2
  • 22
  • 36