0

I'm using R to do my data analysis. I'm looking for the code to achieve the below mentioned output.

I need a single piece of code to do this as I have over 500 groups & 24 months in my actual data. The below sample has only 2 groups & 2 months.

This is a sample of my data.

Date    Group   Value
1-Jan-16    A   10
2-Jan-16    A   12
3-Jan-16    A   17
4-Jan-16    A   20
5-Jan-16    A   12
5-Jan-16    B   56
1-Jan-16    B   78
15-Jan-16   B   97
20-Jan-16   B   77
21-Jan-16   B   86
2-Feb-16    A   91
2-Feb-16    A   44
3-Feb-16    A   93
4-Feb-16    A   87
5-Feb-16    A   52
5-Feb-16    B   68
1-Feb-16    B   45
15-Feb-16   B   100
20-Feb-16   B   81
21-Feb-16   B   74

And this is the output I'm looking for.

Month   Year    Group   Minimum Value   5th Percentile  10th Percentile 50th Percentile 90th Percentile Max Value
Jan 2016    A                       
Jan 2016    B                       
Feb 2016    A                       
Feb 2016    B       
  • Please DO NOT post data as an image. If you want help, the first step is to do take a little time and copy/paste (or type) the data in a form that we can copy ourselves. Your chances of getting substantial help increase considerably if you make it easy. I suggest you quickly read about [minimal](http://stackoverflow.com/help/mcve) and [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) questions. It really does make a difference! – r2evans Feb 07 '17 at 04:46
  • have you checked [this similar question](http://stackoverflow.com/questions/5473537/how-to-calculate-95th-percentile-of-values-with-grouping-variable-in-r-or-excel) ? – Aramis7d Feb 07 '17 at 04:48
  • @r2evans I have corrected the data. Is this the right way of sharing data? – Vishnu Raj Feb 07 '17 at 05:03
  • Closer. The second link specifically suggested the use of `dput(variablename)`. Think about this: if somebody told you to copy the data from this page into an R data.frame without typing it in verbatim. How would you do it? Not take the same variable and copy the output from `dput(...)` into the question, and then realize it is incredibly easier for us to try to use your data. – r2evans Feb 07 '17 at 05:11
  • Now the next thing suggested is to show the code you have already tried. SO is not a site for "please do this for me". – r2evans Feb 07 '17 at 05:17

1 Answers1

1

considering dft as your input, you can try:

library(dplyr)
dft %>% 
  mutate(Date = as.Date(Date, format = "%d-%b-%y")) %>%
  mutate(mon = month(Date),
         yr = year(Date)) %>%
  group_by(mon,yr,Group) %>%
  mutate(minimum = min(Value),
         maximum = max(Value),
         q95 = quantile(Value, 0.95)) %>%
  select(minimum, maximum, q95) %>%
  unique()

which gives:

    mon    yr Group minimum maximum   q95
  <int> <int> <chr>   <int>   <int> <dbl>
1     1  2016     A      10      20  19.4
2     1  2016     B      56      97  94.8
3     2  2016     A      44      93  92.6
4     2  2016     B      45     100  96.2

and add more variables as per your need.

Aramis7d
  • 2,444
  • 19
  • 25
  • Any particular reason for: (1) breaking the first `mutate` into two calls, and (2) using `dft$Date` inside the `mutate`, and (3) `mutate()` instead of `summarize()` (obviating the need for `unique()`). – r2evans Feb 07 '17 at 05:21
  • 1.clarity for the dependant nature of the second `mutate` on the first . 2. my bad, wasn't really needed. 3. yes, but I personally prefer having the `mutate` and `select` cycle, which allows me to briefly look at outputs. Obviously, your edits are more suited for production level code. :) – Aramis7d Feb 07 '17 at 05:27
  • 2
    I learned a lot from reading answers on SO, though it was not always clear to me the distinction between "demonstrative verbosity" and "efficient code". I'm still not a pro, but I think clarity and simplicity are very useful as teaching aids. – r2evans Feb 07 '17 at 05:35
  • True. I learned a lot here as well, and I think I understand what you mean by this trade-off of using more function calls than minimally needed to make things clearer and easily reproducible in the future. After all, this is not [the place for golfing](http://codegolf.stackexchange.com/) – Aramis7d Feb 07 '17 at 06:04