0

I wish to extract the top n largest values from time series data e.g. For Jan, display top n values; For Feb, display top 10 values, etc.

#Data set example

df <-  data.frame(
  variables = rep(c("height", "weight", "mass", "IQ", "EQ"), times = 12),
  month = rep(1:12, each = 5),
  values = rnorm(60, 3, 1)
)

head(df, 10)
     variables month   values
1     height     1 1.859971
2     weight     1 3.985432
3       mass     1 4.755852
4         IQ     1 1.507079
5         EQ     1 2.816110
6     height     2 2.394953
7     weight     2 3.256810
8       mass     2 3.776439
9         IQ     2 3.038668
10        EQ     2 3.540750

Trying to extract top 3 values each month but I'm getting this error:

df %>% 
  group_by(month) %>% 
  summarise(top.three = top_n(3))

Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"

Could anyone advise please? Thanks.

Desmond
  • 1,047
  • 7
  • 14
  • 1
    This pulls the top 3 values of each month `df %>% group_by(month) %>% top_n(3)` if you need the top value of each variable add that to your grouping statement. – LJW Nov 24 '19 at 07:09

1 Answers1

1

When you use summarise, it does it on all your columns and you must end up with length 1.

How about sorting it first based on the column and taking top 3?

df %>% arrange(desc(values)) %>% group_by(month) %>% top_n(wt=values,3)

or if you want to see your results sorted:

df %>% arrange(month,desc(values)) %>% group_by(month) %>% top_n(wt=values,3)

# A tibble: 36 x 3
# Groups:   month [12]
   variables month values
   <fct>     <int>  <dbl>
 1 height        1   5.42
 2 mass          1   3.21
 3 EQ            1   3.19
 4 EQ            2   4.66
 5 weight        2   4.40
 6 IQ            2   3.97
 7 IQ            3   4.73
 8 height        3   3.89
 9 mass          3   3.73
10 IQ            4   3.97
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks so much for this solution - worked like a charm. So if I'm understanding this correctly, there was a part of my code which resulted in a length greater than 1? Using Summarize() together with top_n perhaps? – Desmond Nov 24 '19 at 08:07
  • 1
    The way I see it, `top_n` is not a `summarise`-ing function like for example `sum` or `mean`, so the `summarise` is superfluous here. As @StupidWolf mentions, a function that is passed into `summarise` needs to return a single value, and `top_n` (with `n=3`) doesn't return a single value. – Valeri Voev Nov 24 '19 at 09:35
  • 1
    Hi @Desmond, sorry for the late reply. So there's two conflicts, 1. top_n has to take in a tbl object as input, so evennwhen you provide it as summarise(x=top_n(1)) it will not work. 2. as ValeriVoev pointed out above, summarise expects a result of length one. – StupidWolf Nov 24 '19 at 12:53