I'm having a tough time adapting a script that I've previously used to read in and recode sequentially labeled data.tables.
I have a series of data.tables in R that are sequentially labeled: df1
,df2
,df3
, etc. There are then specific (and inconsistent) rules that I apply to create new variables in those data.tables
called status
and csat
.
What I would like to do is:
- Read in the data tables
- Recode the
csat
variable into a new variable - Subset the data.table so it only includes 4 variables (
csat
,csat_d
,id
,andstatus
) - Merge the data.table with previous tables using an outer join (so it can be reshaped into long form)
I am trying to address points 1-3 in the script below, and have no idea how to implement #4.
EDITED:
df_names<-c(df,df2,df3) # Create list of data.tables csat_vars<-c("CustomerId","csat","csat_d","status") # Create list of 4 variables out <- lapply(1:length(df_names), function(idx) { d <- df_names[idx] d$csat_d <- recode(d$csat,"1:5=0;6:7=1;NA=NA;") d <- subset(d, select=c(csat_vars)) })
I am agnostic about whether or not it's better to use data.table
or data.frame
(these are small datasets), so any help is welcome.
Mini-datasets here:
> dput(head(df))
structure(list(respid = c(1499L, 433L, 2600L, 2282L, 1503L, 3304L
), csat = c(4L, 6L, NA, NA, 6L, 4L), status = c("Active", "Active",
"Active", "Active", "Active", "Active"), touch = c(2L, 3L, 2L,
3L, 2L, 2L)), .Names = c("CustomerId", "csat", "status", "touch"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)
> dput(head(df2_r))
structure(list(respid = c(6L, 5L, 149L, 147L, 270L, 145L), csat = c(4L,
NA, 6L, 7L, 7L, 4L), status = c("Active", "Lapsed/Churned", "Active",
"Active", "Active", "Active"), touch = c(3L, NA, 3L, 1L, 3L,
1L)), .Names = c("CustomerId", "csat", "status", "touch"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)
> dput(head(df3))
structure(list(respid = c(1713L, 1611L, 1630L, 1773L, 1391L,
1571L), csat = c(4L, 6L, 4L, 5L, 7L, 4L), status = c("Active",
"Active", "Active", "Active", "Active", "Active"), AGENCY_1 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), .Names = c("CustomerId", "csat", "status", "AGENCY_1"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)