-8

I have the following dataframe:

enter image description here

1st column contains identification numbers (respnr) 2nd column contains reference date per identification number (cdatalg) 3rd through 16th column contains dates (datesc1:datesc14)

I wish to enter a 17th column in which the date in column 3:16 is printed which is closest to the reference date in column 2. I have tried combinations of apply and which.min and neardate, though can't figure it out.

A push in the right direction would be very much appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Dec 22 '17 at 09:50
  • 2
    Also: never use pictures to show your data. See also the links from my previous comment. – Jaap Dec 22 '17 at 09:52

1 Answers1

1

This simple example will help you:

# example data
df = data.frame(id = c("A","B"),
                date_ref = c("2013-01-26", "2013-01-08"),
                date1 = c("2013-01-23", "2013-01-01"),
                date2 = c("2013-01-20", "2013-01-07"),
                stringsAsFactors = F)

df

#   id   date_ref      date1      date2
# 1  A 2013-01-26 2013-01-23 2013-01-20
# 2  B 2013-01-08 2013-01-01 2013-01-07

library(dplyr)
library(lubridate)
library(tidyr)

# update date column to datetime variables
# (if needed)
df = df %>% mutate_at(vars(matches("date")), ymd)

df %>%
  gather(type,date_new,-id,-date_ref) %>%        # reshape dataset
  group_by(id) %>%                               # for each id
  filter(abs(difftime(date_ref, date_new)) == min(abs(difftime(date_ref, date_new)))) %>%  # keep row with minimum distance between dates
  ungroup() %>%                                  # forget the grouping
  select(-type) %>%                              # remove that variable
  inner_join(df, by=c("id","date_ref"))          # join back original dataset

# # A tibble: 2 x 5
#      id   date_ref   date_new      date1      date2
#   <chr>     <date>     <date>     <date>     <date>
# 1     A 2013-01-26 2013-01-23 2013-01-23 2013-01-20
# 2     B 2013-01-08 2013-01-07 2013-01-01 2013-01-07

Not sure if you can have multiple dates with the same distance (same date multiple times of same number of days before/after your baseline date) and how you want to treat them.

AntoniosK
  • 15,991
  • 2
  • 19
  • 32