0

Let's say I have two dataframes and in each one there is a column date. One column has length m while the other one length n, where m > n. I would like to know which elements are in m that are not in n.

I could easily do this by df1 %in% df2. However, I only want to look for year and month in the column dates. Let's take an example:

# df1

df1 = structure(list(Date = structure(c(10961, 
10990, 11018, 11046, 11060, 11088, 11116, 11144, 11214, 11235, 
11249, 11263, 11305, 11354, 11382), class = "Date")), row.names = c(NA, 
15L), class = "data.frame")

# df2

df2 = structure(list(Date = structure(c(10961, 
10961, 11018, 11046, 11060, 11088, 11116, 11144, 11214, 11235, 
11249, 11263), class = "Date")), row.names = c(NA, 12L), class = "data.frame")

I would like to get the following:

# 2000-02-03
# 2000-12-14
# 2001-02-01
# 2001-03-01

However, I want to restrain the search only to years and months. In other words, if I have the same month but different days, ideally the code shouln't signal it.

Can anyone help me?

Rollo99
  • 1,601
  • 7
  • 15

2 Answers2

2

you can use

dplyr::setdiff(df1, df2)

or

df1$Date[!(df1$Date %in% df2$Date)]

# Date
# 1 2000-02-03
# 2 2000-12-14
# 3 2001-02-01
# 4 2001-03-01
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
AlexB
  • 3,061
  • 2
  • 17
  • 19
1

Is this what you're looking for?

df1$Date[!format(as.Date(df1$Date, "%Y-%m-%d"), "%Y-%m") %in% format(as.Date(df2$Date, "%Y-%m-%d"), "%Y-%m")]
[1] "2000-02-03" "2000-12-14" "2001-02-01" "2001-03-01"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34