1

I am just venturing into R programming and finding my way around.

Lets say I have a table as below:

Store | Product | Sales
X | A | 2
X | B | 1
X | C | 3
Y | A | 1
Y | B | 2
Y | C | 5
Z | A | 3
Z | B | 6
Z | C | 2

I need to change the sales values of certain products based on another table. Please find below:

Product | Sales
A | 10
B | 7
C | 15

My final table should be:

Store | Product | Sales
X | A | 10
X | B | 7
X | C | 15
Y | A | 10
Y | B | 7
Y | C | 15
Z | A | 10
Z | B | 7
Z | C | 15

I have 2 methods of doing this now: 1) Using joins 2) Using an if-else statement inside a for loop to subset the

Is there any other way to do this more effectively and in fewer steps? Thanks in advance!

EDIT: I forgot to mention an exception earlier. What if my dataset is like below?

Store | Product | Sales
X | A | 2
X | B | 1
X | C | 3
X | D | 4
Y | A | 1
Y | B | 2
Y | C | 5
Y | D | 2
Z | A | 3
Z | B | 6
Z | C | 2 
Z | D | 3

There's an extra product(D) with sales. I want to retain the value of sales for that product if it is not present in the 2nd table which is:

 Product | Sales
    A | 10
    B | 7
    C | 15
  • 4
    *"I have 2 methods of doing this now"* Please share your code. – Maurits Evers Jun 06 '18 at 13:12
  • 3
    Using joins sounds like a good way to go. You could also use `match`. – Gregor Thomas Jun 06 '18 at 13:14
  • `df1$SalesNew <- c(A = 10, B = 7, C = 15)[ df1$Product ]` – zx8754 Jun 06 '18 at 13:15
  • @zx8754 This will be cumbersome if real data has a lot more levels. – LAP Jun 06 '18 at 13:18
  • 1
    @LAP not really, `c(A = 10, B = 7, C = 15)` could be generated using existing dataframe. OP wants "other" solutions, but I would go with joins/match solution. – zx8754 Jun 06 '18 at 13:19
  • @zx8754 I appreciate the help, but this will not help in general cases where there will be a lot of data and the values are different. If you can provide a more general solution, that'll be great! Thank you! – Adithyashankar Ajith Jun 06 '18 at 13:20
  • As I mentioned, it can be more automated, like: `df1$SalesNew <- setNames(df2$Sales, df2$Product)[ df1$Product ]` – zx8754 Jun 06 '18 at 13:22
  • 2
    `df1$SalesNew <- df2$Sales[match(df1$Product, df2$Product)]` will work as suggested by @Gregor. – LAP Jun 06 '18 at 13:23
  • 1
    I vote to close, as I don't understand why your existing 2 methods are not good enough, post is "Too broad". – zx8754 Jun 06 '18 at 13:23

2 Answers2

1

How about this join?

Since you want to change the Sales values of certain Products only so to illustrate this I have considered only two products in lookup_df

library(dplyr)

df %>%
  left_join(lookup_df, by = "Product", suffix = c("_Original", "_New")) %>%
  mutate(Sales_New = coalesce(Sales_New, Sales_Original))

Output is:

  Store Product Sales_Original Sales_New
1     X       A              2        10
2     X       B              1         1
3     X       C              3        15
4     Y       A              1        10
5     Y       B              2         2
6     Y       C              5        15
7     Z       A              3        10
8     Z       B              6         6
9     Z       C              2        15

Sample data:

df <- structure(list(Store = c("X", "X", "X", "Y", "Y", "Y", "Z", "Z", 
"Z"), Product = c("A", "B", "C", "A", "B", "C", "A", "B", "C"
), Sales = c(2L, 1L, 3L, 1L, 2L, 5L, 3L, 6L, 2L)), .Names = c("Store", 
"Product", "Sales"), class = "data.frame", row.names = c(NA, 
-9L))

lookup_df <- structure(list(Product = c("A", "C"), Sales = c(10L, 15L)), .Names = c("Product", "Sales"), class = "data.frame", row.names = c(NA, 
-2L))
#  Product Sales
#1       A    10
#2       C    15
Prem
  • 11,775
  • 1
  • 19
  • 33
  • Hi. Thank you for your help. I've included a small edit to the question with an exception. – Adithyashankar Ajith Jun 06 '18 at 14:13
  • I already have considered your updated case. If you see the sample data used in my answer carefully then in `lookup_df` product `B` is not there but in `df` it has sales value. – Prem Jun 06 '18 at 17:32
0

If you use a lookup-vector, it is relatively short:

d <- read.table(text = "
Store | Product | Sales
X | A | 2
X | B | 1
X | C | 3
Y | A | 1
Y | B | 2
Y | C | 5
Z | A | 3
Z | B | 6
Z | C | 2", sep = "|", header = T, stringsAsFactors = F)

lookup <- read.table(text = "Product | Sales
A | 10
B | 7
C | 15", sep = "|", header = T, stringsAsFactors = F)

lookup$Product <- gsub("^\\s+|\\s+$", "", lookup$Product) # remove spaces
lookup <- setNames(lookup$Sales, lookup$Product) # convert to vector
d$Product <- gsub("^\\s+|\\s+$", "", d$Product) # remove spaces

d$Sales <- lookup[d$Product] # main part
d
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39