1

I have a data frame as follows:

d <- data.frame(
    shop=c('A','A','A','A','A','B','B','B','C','C','C','C','D','E','E'),
    product=c(1:5,1:3,1:4,1,1:2), 
    price=c(16.83, 12.21, 9.99, 3.99, 1.00,
      19.50, 6.42, 1.89,
      13.95, 12.50, 7.87, 0.79,
      11.99,
      22.80, 15.99)  )

The data represent the prices of products held by shops, with the products numbered in reverse price-order within each shop. I'd like to discover various things (e.g., mean, distribution, maximum price) about the most expensive products across stores, and similarly about the least expensive. With that in mind, I want to rescale the product numbering so that in all stores, the most expensive is numbered 1 and the least expensive is numbered 10 (with fractional product numbers being just fine).

Things I know:

  1. How to scale a single vector of product numbers using the maximum and minimum values
  2. How to obtain the maximum product number by shop: aggregate(d[, 2], list(d$shop), max)

But I don't see how to fit the pieces together so that, for example, the product numbers for shop C are, from most expensive to least expensive, c(1,4,7,10).

CrimsonDark
  • 661
  • 9
  • 20
  • Please show how you arrived at `c(1,4,7,10)` for C and how you define the scale for #1. – Parfait Jul 02 '18 at 13:58
  • As I say in the OP, I am not trying to rescale the prices --- as shown in the answer below from misuse. I'm trying to rescale the product numbers. Rescaling four numbers, 1:4, over the range 1:10 gives an equal gap of 3 between the numbers ... c(1, 4, 7, 10). Of course, it is easy to make the slight change needed to @misuse answer. – CrimsonDark Jul 03 '18 at 00:42
  • It looks as if my question is a duplicate. I hadn't realised. https://stackoverflow.com/questions/41761018/scale-all-values-depending-on-group – CrimsonDark Jul 03 '18 at 00:44

1 Answers1

0

Here is an approach using dplyr and scales::rescale:

df %>%
  group_by(shop) %>% #group by shop
    arrange(shop, desc(price)) %>% #arrange by shop and price
    mutate(scaled = 1:n(), #create a label for price (in this case it matches the product id but in some cases it might not)
           scaled = scales::rescale(scaled, to = c(1, 10)))

scale to range 1 - 10

#output
# A tibble: 15 x 4
# Groups:   shop [5]
   shop  product price scaled
   <fct>   <dbl> <dbl>  <dbl>
 1 A           1 16.8    1   
 2 A           2 12.2    3.25
 3 A           3  9.99   5.5 
 4 A           4  3.99   7.75
 5 A           5  1     10   
 6 B           1 19.5    1   
 7 B           2  6.42   5.5 
 8 B           3  1.89  10   
 9 C           1 14.0    1   
10 C           2 12.5    4   
11 C           3  7.87   7   
12 C           4  0.79  10   
13 D           1 12.0    5.5 #one item in category, it is both min and max, so (1+10)/2 = 5.5
14 E           1 22.8    1   
15 E           2 16.0   10   

to get max product number one can use summarise after grouping:

df %>%
  group_by(shop) %>%
  summarise(max = max(product))
#output
# A tibble: 5 x 2
  shop    max
  <fct> <dbl>
1 A      5.00
2 B      3.00
3 C      4.00
4 D      1.00
5 E      2.00

or if one wants to summarise several variables at once (here on columns 2 and 3):

df %>%
  group_by(shop) %>%
  summarise_at(2:3, max)
## A tibble: 5 x 3
  shop  product price
  <fct>   <dbl> <dbl>
1 A        5.00  16.8
2 B        3.00  19.5
3 C        4.00  14.0
4 D        1.00  12.0
5 E        2.00  22.8
missuse
  • 19,056
  • 3
  • 25
  • 47
  • Thanks ... but note also that it is the product numbers, not the prices that I want to rescale; it's only a tiny change from the solution you've shown. By rescaling the prices, one ends up with all "most expensive" products valued at 1, whereas I want to retain the actual price so I can determine the distribution of all "least expensive" products (for example). – CrimsonDark Jul 03 '18 at 01:59