I'd like to restructure my data frame, which features data on whether a home was viewed, purchased, or sold, and what the date for the viewing/purchase/sale was.
The data has been enterred in such a way that whilst there are multiple date variables (var_datevar_A5_3_B32
,var_datevar_A5_4_B33
,var_datevar_A5_5_B34
), each of these variables contains a mix of sales-dates, purchase-dates, and viewing-dates.
So if a want to know if a date in eg var_datevar_A5_3_B32
is a viewing, sales, or purchase date, I need to check variables with a matching prefix (A5_3_B32), ie var_viewed_A5_3_B32
, var_sold_A5_3_B32
, and var_purchased_A5_3_B32
, and see which one is coded as "Y".
I'd like to add three new variables to my data frame: a sales dates only variable (var_date_sold
), a purchase dates only variable (var_date_purchased
), and a viewing dates only variable (var_date_viewed
), like in dfgoal
.
I've made an attempt but apart from that I can't figure out how to paste the date into my new variable, I'd like a more general solution because in my real data set I've got loads of variables like this.
Thanks for any help!
# Starting point
df <- data.frame(var_home=c("a","b","c","d","e"),
var_datevar_A5_3_B32=c("01-12-2012","05-06-2014","03-09-2015","03-03-2016",NA),var_datevar_A5_4_B33=c("07-12-2012",NA,"07-07-2017","13-03-2016",NA),var_datevar_A5_5_B34=c("05-03-2013",NA,NA,"23-03-2016",NA),var_purchased_A5_3_B32=c(NA,"Y",NA,NA,NA),var_purchased_A5_4_B33=c("N","N","Y",NA,NA),var_purchased_A5_5_B34=c("Y",NA,NA,NA,NA),
var_viewed_A5_3_B32=c("Y","N","Y","Y",NA),var_viewed_A5_4_B33=c(NA,NA,NA,"Y","N"),var_viewed_A5_5_B34=c(NA,NA,NA,"Y",NA),var_sold_A5_3_B32=c("NA","N",NA,"N",NA),var_sold_A5_4_B33=c("Y","N",NA,"N",NA),var_sold_A5_5_B34=c("N",NA,NA,NA,"Y"))
# Goal
dfgoal <- data.frame(var_home=c("a","b","c","d","e"),
var_datevar_A5_3_B32=c("01-12-2012","05-06-2014","03-09-2015","03-03-2016",NA),var_datevar_A5_4_B33=c("07-12-2012",NA,"07-07-2017","13-03-2016",NA),var_datevar_A5_5_B34=c("05-03-2013",NA,NA,"23-03-2016",NA),var_purchased_A5_3_B32=c(NA,"Y",NA,NA,NA),var_purchased_A5_4_B33=c("N","N","Y",NA,NA),var_purchased_A5_5_B34=c("Y",NA,NA,NA,NA),
var_viewed_A5_3_B32=c("Y","N","Y","Y",NA),var_viewed_A5_4_B33=c(NA,NA,NA,"Y","N"),var_viewed_A5_5_B34=c(NA,NA,NA,"Y",NA),var_sold_A5_3_B32=c("NA","N",NA,"N",NA),var_sold_A5_4_B33=c("Y","N",NA,"N",NA),var_sold_A5_5_B34=c("N",NA,NA,NA,"Y"),
var_date_viewed =c("01-12-2012",NA,"03-09-2015","03-03-2016,13-03-2016,23-03-2016",NA),var_date_sold=c("07-12-2012",NA,NA,NA,NA),var_date_purchased=c("05-03-2013","05-06-2014","07-07-2017",NA,NA))
# Attempt pseudo-code
var_date_viewed <- ifelse((!is.na(var_datevar_A5_3_B32) & var_viewed_A5_3_B32=="Y") | (!is.na(var_datevar_A5_4_B33) & var_viewed_A5_4_B33=="Y") | (!is.na(var_datevar_A5_5_B34) & var_viewed_A5_5_B34=="Y"), PASTE_THE_DATE,NA)