0

I have two data frame and I want to subset specific rows in df2. Here are df1 and df2:

df1:
   Sdate    columnA    D
2020-05-14    DD       1
2020-05-14    FF       5
2020-05-14    EE       6
2020-05-14    GG       7

df2:
   Sdate      ColA     C
2020-04-13    NN       1
2020-04-13    XX       1
2020-04-14    VV       5
2020-04-15    DD       6
2020-04-16    AA       7

Here are the steps to get my final output:

  1. I need to calculate date differences between df1's [1,1] which is "2020-05-14" and df2's [1,1] which is "2020-04-13"
  2. I need to figure out if the difference is larger than 10 days.
  3. Finally, if it is larger than 10 days, I want to delete rows having oldest dates in df2. Because 2020-04-13 is the oldest date in df2, I want to delete first two lows of df2.

"2020-05-14" - "2020-04-13" is 31. Therefore, my final output of df2 should be

   Sdate     ColA      C
2020-04-14    VV       5
2020-04-15    DD       6
2020-04-16    AA       7

I tried with the codes following:

df2 <- ifelse(as.numeric(as.Date(as.character(df1[1,1]), format="%Y-%m-%d")-
                         as.Date(as.character(df2[1,1]), format="%Y-%m-%d"))>10,
                       subset(df2, Sdate!= df2[1,1]),print("Pass"))

I tested this code separately in three pieces, and they worked well. But it doesn't in combined code above. df2 is just gone with the code.

What should I change to get what I want to have?

Chris
  • 3
  • 2

1 Answers1

1

You can use dplyr for this. I have provided a method where you don't need to compare the first row, but can simply take the minimum.

library(dplyr)

new_df <- df2 %>% 
  mutate(
    isOldest = Sdate == min(Sdate),
    deleteOldest = as.integer(min(df1$Sdate) - min(Sdate)) > 10
  ) %>% 
  filter(!(isOldest & deleteOldest))

If instead you actually do need just a comparison of the first row:

new_df <- df2 %>% 
  mutate(
    isOldest = Sdate == df2$Sdate[1],
    deleteOldest = as.integer(df1$Sdate[1] - df2$Sdate[1]) > 10
  ) %>% 
  filter(!(isOldest & deleteOldest))

Hope this is what you need. The dataframes below.

df1 <- data.frame(
  Sdate = as.Date('2020-05-14'),
  columnA = c('DD', 'FF', 'EE', 'GG'),
  D = c(1, 5, 6, 7),
  stringsAsFactors = FALSE
)

df2 <- data.frame(
  Sdate = as.Date(c(rep('2020-04-13', 2), '2020-04-14', '2020-04-15',' 2020-04-16')),
  colA = c('NN', 'XX', 'VV', 'DD', 'AA'),
  C = c(1, 1, 5, 6, 7),
  stringsAsFactors = FALSE
)
WilliamGram
  • 673
  • 3
  • 7
  • Thank you so much, but I got an error: Problem with `mutate()` input `deleteOldest`. x is a non-numeric factor for binary. I think my original's data frame has different date format in Sdate column. – Chris Mar 18 '21 at 07:46
  • How did you read in the dataframes? I explicitly set stringsAsFactors to false. Perhaps this explains the difference? – WilliamGram Mar 18 '21 at 08:26
  • Hey, it works when I edit like my original question: deleteOldest = as.numeric(as.Date(as.character(min(Prv$Sdate)),format="%Y-%m-%d") - as.Date(as.character((Sdate)),format="%Y-%m-%d")) > 10 Do you know why it happens like this? Thank you anyway! – Chris Mar 19 '21 at 07:47
  • Perfect that it worked. I would suggest trying `class(Prv$Sdate)`. I suspect, in your case, it says factor. There is a post [here](https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters) that could be worth looking at. – WilliamGram Mar 19 '21 at 07:56