Subtract columns from two different datasets

Question

I would like to know how I can do a subtraction between the dataset I got (All) with my df1 dataset. I inserted an image to illustrate the output. So you'll notice that I want to subtract the value of the coef of All from the columns of DR0.. of the df1.

library(dplyr)
library(tidyverse)
library(lubridate)

df1 <- structure(
  list(date1= c("2021-06-28","2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-06-30","2021-06-30","2021-07-01","2021-07-01"),
       Category = c("FDE","ABC","FDE","ABC"),
       Week= c("Wednesday","Wednesday","Friday","Friday"),
       DR1 = c(4,1,6,3),
       DR01 = c(4,1,4,3), DR02= c(4,2,6,2),DR03= c(9,5,4,7),
       DR04 = c(5,4,3,2),DR05 = c(5,4,5,4),
       DR06 = c(2,4,3,2)),
  class = "data.frame", row.names = c(NA, -4L))

> df1
       date1      date2 Category      Week DR1 DR01 DR02 DR03 DR04 DR05 DR06
1 2021-06-28 2021-06-30      FDE Wednesday   4    4    4    9    5    5    2
2 2021-06-28 2021-06-30      ABC Wednesday   1    1    2    5    4    4    4
3 2021-06-28 2021-07-01      FDE    Friday   6    4    6    4    3    5    3
4 2021-06-28 2021-07-01      ABC    Friday   3    3    2    7    2    4    2

return_coef <- function(dmda, CategoryChosse) {
  
  x<-df1 %>% select(starts_with("DR0"))
  
  x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
  PV<-select(x, date2,Week, Category, DR1, ends_with("PV"))
  
  med<-PV %>%
    group_by(Category,Week) %>%
    summarize(across(ends_with("PV"), median))
  
  SPV<-df1%>%
    inner_join(med, by = c('Category', 'Week')) %>%
    mutate(across(matches("^DR0\\d+$"), ~.x + 
                    get(paste0(cur_column(), '_PV')),
                  .names = '{col}_{col}_PV')) %>%
    select(date1:Category, DR01_DR01_PV:last_col())
  
  SPV<-data.frame(SPV)
  
  mat1 <- df1 %>%
    filter(date2 == dmda, Category == CategoryChosse) %>%
    select(starts_with("DR0")) %>%
    pivot_longer(cols = everything()) %>%
    arrange(desc(row_number())) %>%
    mutate(cs = cumsum(value)) %>%
    filter(cs == 0) %>%
    pull(name)
  
  (dropnames <- paste0(mat1,"_",mat1, "_PV"))
  
  SPV <- SPV %>%
    filter(date2 == dmda, Category == CategoryChosse) %>%
    select(-any_of(dropnames))
  
  datas<-SPV %>%
    filter(date2 == ymd(dmda)) %>%
    group_by(Category) %>%
    summarize(across(starts_with("DR0"), sum)) %>%
    pivot_longer(cols= -Category, names_pattern = "DR0(.+)", values_to = "val") %>%
    mutate(name = readr::parse_number(name))
  colnames(datas)[-1]<-c("Days","Numbers")
  
  datas <- datas %>% 
    group_by(Category) %>% 
    slice((as.Date(dmda) - min(as.Date(df1$date1) [
      df1$Category == first(Category)])):max(Days)+1) %>%
    ungroup
  
  mod <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 0,b2 = 0),data = datas, algorithm = "port")
  as.numeric(coef(mod)[2])
  
}

All<-cbind(df1 %>% select(date2, Category), coef = mapply(return_coef, df1$date2, df1$Category))
> All
       date2 Category coef
1 2021-06-30      FDE    4
2 2021-06-30      ABC    1
3 2021-07-01      FDE    6
4 2021-07-01      ABC    3

Output I want

FYI, we need *none* of that code to help you here, just your `df1 <- structure(..)` and the **results** of creating `All`, all of the other code is distraction (and a deterrent, some may find "lots of code" as a reason to dismiss a question as too complex or time-consuming). — r2evans, Oct 31 '21 at 23:06
Ah sorry @r2evans, you right! Thank you very much for reply! Is it possible to remove the `DR1` column? It just leave it as I did in the image I inserted? In addition, please, if you can, adjust the names columns — , Oct 31 '21 at 23:09

r2evans · Accepted Answer · 2021-10-31T23:11:39.490

2

This is a combination of join/merge (see How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?) and an across mutate:

library(dplyr)
left_join(All, df1, by = c("date2", "Category")) %>%
  mutate(across(starts_with("DR0"), ~ coef - .))
#        date2 Category coef      date1      Week DR1 DR01 DR02 DR03 DR04 DR05 DR06
# 1 2021-06-30      FDE    4 2021-06-28 Wednesday   4    0    0   -5   -1   -1    2
# 2 2021-06-30      ABC    1 2021-06-28 Wednesday   1    0   -1   -4   -3   -3   -3
# 3 2021-07-01      FDE    6 2021-06-28    Friday   6    2    0    2    3    1    3
# 4 2021-07-01      ABC    3 2021-06-28    Friday   3    0    1   -4    1   -1    1

Data

df1 <- structure(list(date1 = c("2021-06-28", "2021-06-28", "2021-06-28", "2021-06-28"), date2 = c("2021-06-30", "2021-06-30", "2021-07-01", "2021-07-01"), Category = c("FDE", "ABC", "FDE", "ABC"), Week = c("Wednesday", "Wednesday", "Friday", "Friday"), DR1 = c(4, 1, 6, 3), DR01 = c(4, 1, 4, 3), DR02 = c(4, 2, 6, 2), DR03 = c(9, 5, 4, 7), DR04 = c(5, 4, 3, 2), DR05 = c(5, 4, 5, 4), DR06 = c(2, 4, 3, 2)), class = "data.frame", row.names = c(NA, -4L))
All <- structure(list(date2 = c("2021-06-30", "2021-06-30", "2021-07-01", "2021-07-01"), Category = c("FDE", "ABC", "FDE", "ABC"), coef = c(4L, 1L, 6L, 3L)), class = "data.frame", row.names = c("1", "2", "3", "4"))

edited Oct 31 '21 at 23:11

answered Oct 31 '21 at 23:05

r2evans

141,215
6
77
149

Yes (continuing our conversation down here), I just tightened up the `starts_with` to a little more. FYI, the first argument to `across()` is just something that could go into `dplyr::select`, including `dplyr::starts_with(..)`, `everything()`, `c(DR01, DR02)`, `-c(date1, date2, Category, coef, DR1)`, etc. I hope you can see how to extend that to include or exclude variables as you see fit. – r2evans Oct 31 '21 at 23:13
,@r2evans: Have you recognized that `mutate(df1, across(DR01:DR06, ~ DR1 - .))` gives the same result? DR1 = coef. – TarJae Oct 31 '21 at 23:29
1

@TarJae perhaps, but I was going off the OP's indication that the math should be `coef-.`, suggesting it may not always be the same as `DR`. – r2evans Oct 31 '21 at 23:35
Thanks @r2evans, but when I use the part of your code, ie `left_join(All..)`, in the code I inserted in the question, the output is different from the output of your answer, do you know why? – Oct 31 '21 at 23:51
I don't know what your console is showing you. – r2evans Nov 01 '21 at 01:09
@JVieira, any updates? We can't help if all you can say is it *"is different"*. How is it different? What is different about your data (from here in the OP) that could be causing it? – r2evans Nov 03 '21 at 09:57
Sorry for the delay in replying @r2evans, the answer is correct. – Nov 05 '21 at 11:36

score 0 · Answer 2 · answered Oct 31 '21 at 23:22

Here is an alternative dplyr way: We use bind_cols a rare used dplyr function. Interestingly without transforming to integer class it did not work:

library(dplyr)
df1 %>% 
  bind_cols(coef = All$coef) %>% 
  mutate(across(DR1:coef, as.integer)
         across(DR01:DR06, ~coef - .))

  date1      date2      Category Week        DR1  DR01  DR02  DR03  DR04  DR05  DR06  coef
  <chr>      <chr>      <chr>    <chr>     <int> <int> <int> <int> <int> <int> <int> <int>
1 2021-06-28 2021-06-30 FDE      Wednesday     4     0     0    -5    -1    -1     2     4
2 2021-06-28 2021-06-30 ABC      Wednesday     1     0    -1    -4    -3    -3    -3     1
3 2021-06-28 2021-07-01 FDE      Friday        6     2     0     2     3     1     3     6
4 2021-06-28 2021-07-01 ABC      Friday        3     0     1    -4     1    -1     1     3

I'd think the presumption that the row-order is always the same might be a touch fragile. Granted, we see all of the code that went into making `All`, but ... in my experience, `cbind`/`bind_cols` is a risk when alignment is not verified. Simpler code, though. — r2evans, Oct 31 '21 at 23:36
You are right. But my first choice `left_join` was already provided by you. So I posted `bind_cols` version as an alternative! — TarJae, Oct 31 '21 at 23:38

Subtract columns from two different datasets

2 Answers2