1

Given a named vector consisting of unit prices and associated price list names, I want to write a function that adds a new column to my dataset that should say which price list the price in the dataset belongs to.

The function should be able to handle missing values (NA and price not included in the price list) as well as duplicate entries in the price list. For (NA and price not included in the price list) something like "not found" is totally okay, and for duplicate entries i would take the first entry

pricelist = rlang::set_names(
  x = c(11.12, 11.45, 14.45, 12.66, 12.96, 14.45),
  nm = c("1", "2", "3", "4", "5", "6"))

data = tibble(
  article = rep("article 34", 10), 
  price = c(11.12, NA, 11.45, 11.45, 11.45, 14.45, NA, 20, 12.96, 12.66)) 

The desired output would look like this:

data %>%
  mutate(pricelist = pricelist_fn(price = price))

# A tibble: 10 x 3
   article    price pricelist
   <chr>      <dbl> <chr>    
 1 article 34  11.1 1        
 2 article 34  NA   not found
 3 article 34  11.4 2        
 4 article 34  11.4 2        
 5 article 34  11.4 2        
 6 article 34  14.4 3        
 7 article 34  NA   not found
 8 article 34  20   not found
 9 article 34  13.0 5        
10 article 34  12.7 4 
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
mcmurphy
  • 104
  • 6

1 Answers1

2
data %>%
  mutate(result = names(pricelist)[match(price, pricelist)])
# # A tibble: 10 × 3
#    article    price result
#    <chr>      <dbl> <chr>  
#  1 article 34  11.1 1      
#  2 article 34  NA   NA     
#  3 article 34  11.4 2      
#  4 article 34  11.4 2      
#  5 article 34  11.4 2      
#  6 article 34  14.4 3      
#  7 article 34  NA   NA     
#  8 article 34  20   NA     
#  9 article 34  13.0 5      
# 10 article 34  12.7 4

If you want a function, you can do this for the same result:

pricelist_fn = function(price, pricelist) {
  names(pricelist)[match(price, pricelist)]
}
data %>%
  mutate(result = pricelist_fn(price, pricelist))

I would add a word of caution, equality testing (like in match()) of floats is risky and can leading bugs due to floating point precision. See this FAQ for more details. If your prices are simply read in and kept, it's probably okay. But if you are doing math to calculate the prices or doing mathematical transformations of the prices, it can be risky and you may need to find another solution that checks for price matches within a tolerance.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks, this works fine! I was trying to do something like ```names(pricelist)[unname(pricelist)==price][1]``` instead of your solution. Could you elaborate why this does not work? Cant get my head arround it really – mcmurphy Dec 15 '22 at 14:54
  • 1
    If you look at the result of `unname(pricelist)==price`, you'll see it's just TRUE/FALSE, with no index. What you need is the indexes, which is what `match` gives you. – Gregor Thomas Dec 15 '22 at 15:43
  • 1
    You could do something with a loop, taking one price at a time and using `names(pricelist)[which(unname(pricelist) == price[i])]`, but that wouldn't be vectorized. – Gregor Thomas Dec 15 '22 at 15:46
  • 1
    I generally prefer lookup tables and joins to named vectors and `match`. If it were my code I'd make `pricelist` a data frame rather than a named vector: `pricelist = data.frame(price = c(11.12, 11.45, 14.45, 12.66, 12.96, 14.45), index = c("1", "2", "3", "4", "5", "6"))` and then use `data %>% left_join(pricelist, by = "price")`. But that's just a style preference. – Gregor Thomas Dec 15 '22 at 15:49