1

I have the following dput which produces DF1

structure(list(ID = c("ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", 
"ID_1"
), Book = c("Book_A", "Book_A", "Book_B", "Book_B", "Book_C", "Book_C", "Book_D", "Book_D", 
"Book_E", "Book_E"), OU = c("Over", "Under", "Over", "Under", "Over", "Under", "Over", 
"Under", "Over", "Under"), Price = c(102, -114, 100, -120, 102, -113, -102, -119, -120, 105), 
Points = c(6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6, 6)), row.names = c(NA, -10L), class = 
c("tbl_df", "tbl", "data.frame"))

DF1 tibble

# A tibble: 10 × 5
ID    Book   OU    Price Points
<chr> <chr>  <chr> <dbl>  <dbl>
1 ID_1  Book_A Over    102    6.5
2 ID_1  Book_A Under  -114    6.5
3 ID_1  Book_B Over    100    6.5
4 ID_1  Book_B Under  -120    6.5
5 ID_1  Book_C Over    102    6.5
6 ID_1  Book_C Under  -113    6.5
7 ID_1  Book_D Over   -102    6.5
8 ID_1  Book_D Under  -119    6.5
9 ID_1  Book_E Over   -120    6  
10 ID_1  Book_E Under   105    6 

and I have the following dput with produces DF2

structure(list(ID = c("ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", "ID_1", 
"ID_1"
), Book = c("Book_E", "Book_E", "Book_E", "Book_E", "Book_E", "Book_E", "Book_E", "Book_E", 
"Book_E", "Book_E"), 
OU = c("Over", "Over", "Over", "Over", "Over", "Under", "Under", 
   "Under", "Under", "Under"), Price = c(-303, -150, 107, 176, 372, 261, 133, -120, -200, 
-447), Points = c(5, 5.5, 6.5, 7, 8.5, 5, 5.5, 6.5, 7, 8.5)), row.names = c(NA, -10L), class = 
c("tbl_df", "tbl", "data.frame"))

DF2 tibble

# A tibble: 10 × 5
ID    Book   OU    Price Points
<chr> <chr>  <chr> <dbl>  <dbl>
1 ID_1  Book_E Over   -303    5  
2 ID_1  Book_E Over   -150    5.5
3 ID_1  Book_E Over    107    6.5
4 ID_1  Book_E Over    176    7  
5 ID_1  Book_E Over    372    8.5
6 ID_1  Book_E Under   261    5  
7 ID_1  Book_E Under   133    5.5
8 ID_1  Book_E Under  -120    6.5
9 ID_1  Book_E Under  -200    7  
10 ID_1  Book_E Under  -447    8.5

As you can see, the Points values in Rows 9 and 10 (Value of 6) of DF1 do not match the other Points values for Rows 1 - 8 (Value of 6.5). What I would like to do is replace the Price AND Points values for Row 9 and 10 with the corresponding Price and Points values in DF2 if the Points values don't match. This is the desired output which we will call DF3.

# A tibble: 10 × 5
ID    Book   OU    Price Points
<chr> <chr>  <chr> <dbl>  <dbl>
1 ID_1  Book_A Over    102    6.5
2 ID_1  Book_A Under  -114    6.5
3 ID_1  Book_B Over    100    6.5
4 ID_1  Book_B Under  -120    6.5
5 ID_1  Book_C Over    102    6.5
6 ID_1  Book_C Under  -113    6.5
7 ID_1  Book_D Over   -102    6.5
8 ID_1  Book_D Under  -119    6.5
9 ID_1  Book_E Over    107    6.5  
10 ID_1  Book_E Under  -120   6.5 

This is not static data and each time I make a call to an API, the contents of DF1 and DF2 will change. Any help would be much appreciated!

Aaron Morris
  • 123
  • 6

1 Answers1

1

Using the Mode() function provided here, change Price to NA where Points doesn’t match the modal value. Then set Points to the modal value, then update the missing values of Price using rows_patch():

library(dplyr)

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df1 %>%
  mutate(
    Price = ifelse(Points == Mode(Points), Price, NA),
    Points = Mode(Points)
  ) %>%
  rows_patch(
    df2, 
    by = c("ID", "Book", "OU", "Points"),
    unmatched = "ignore"
  )
# A tibble: 10 × 5
   ID    Book   OU    Price Points
   <chr> <chr>  <chr> <dbl>  <dbl>
 1 ID_1  Book_A Over    102    6.5
 2 ID_1  Book_A Under  -114    6.5
 3 ID_1  Book_B Over    100    6.5
 4 ID_1  Book_B Under  -120    6.5
 5 ID_1  Book_C Over    102    6.5
 6 ID_1  Book_C Under  -113    6.5
 7 ID_1  Book_D Over   -102    6.5
 8 ID_1  Book_D Under  -119    6.5
 9 ID_1  Book_E Over    107    6.5
10 ID_1  Book_E Under  -120    6.5
zephryl
  • 14,633
  • 3
  • 11
  • 30