-2

I am having difficulty with writing a script in R.

What I am trying to get is:

  • I have a list of Product Codes and Product Descriptions. What I am trying to get is the view in table 2 with MinPercentile >=0.95, MinPercentile >=0.80 and MinPercentile >=0.60 columns filled in. The rule for MinPercentile >=0.95 is:

if the value in the Percentile column is greater or equals to 0.95, and this value is the closest to 0.95 at the same time (so if there were two values say 0.98 and 0.99 then I would be interested in 0.98 since it is closer to 0.95) then bring in the invoice price appearing for that value for tat prodcode to column MinPercentile >=0.95. So looking at the table one 4.19 is the invoice price for the percentile 0.956 which is the closest to 0.95.

The same rule applies to other two columns though only the value criteria are different, we are looking at >= 0.80 and >= 0.60 and again need to bring the invoice price for the closest to 0.60 and 0.80 percentile values into my MinPercentile >=0.80 and MinPercentile >=0.60 columns.

Ideally I am looking to create Table 2 at the product code level that is a summary of table 1, meeting described by me criteria. I hope it makes sense. I would really appreciate help with that.

Appreciate your effort.

enter image description here

  • Why is this tagged javascript? – Axeman Feb 06 '17 at 11:57
  • Also, please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. Giving data as an image is not very helpful, since no-one is going to type that in manually. In addition, please show what you have tried so far, and why it is going wrong. – Axeman Feb 06 '17 at 11:59
  • `dat <- data.frame(prod = c(rep('A', 9), rep('B', 3)), perc = c(.1, .2, .3, .36, .61, .8, .946, .956, .97, .3, .81, .98), price = runif(12, min = 2, max = 4)) ` – GGamba Feb 06 '17 at 13:14

1 Answers1

0

With dplyr and tidyr:

dat <- data.frame(prod = c(rep('A', 9), rep('B', 3)), 
                  perc = c(.1, .2, .3, .36, .61, .8, .946, .956, .97, .3, .81, .98), 
                  price = runif(12, min = 2, max = 4))   

dat %>% 
    # Group by product and percentile range
    group_by(prod, 
             group = cut(perc, 
                         breaks = c(.6, .8, .95, 1), 
                         labels = c('Perc6', 'Perc8', 'Perc95')
                        )
    ) %>% 
    # Make sure percentile are sorted in each group
    arrange(perc) %>% 
    # Take just the first
    slice(1) %>% 
    # Drop percentile column
    select(-perc) %>%
    # Format as desired 
    spread(group, price)

Source: local data frame [2 x 5]
Groups: prod [2]

    prod    Perc6    Perc8   Perc95   `<NA>`
* <fctr>    <dbl>    <dbl>    <dbl>    <dbl>
1      A 2.633812 2.020161 3.505599 2.067004
2      B       NA 2.709279 2.126438 3.586837

Pls note that it does not achieve exactly what you asked as it will divide the percentiles in 'hard' groups. E.g.: For Product B, Perc>=.6 will be NA, as there are no records between .6 and .8, it will only fill the Perc>=.8. It will also have a 'NA' where it lists the value of the lowest percentile.. You can drop it.
There are probably better and more efficient ways to do this, that's the first thing that came to mind.

GGamba
  • 13,140
  • 3
  • 38
  • 47