-1

I'm having a tough time adapting a script that I've previously used to read in and recode sequentially labeled data.tables.

I have a series of data.tables in R that are sequentially labeled: df1,df2,df3, etc. There are then specific (and inconsistent) rules that I apply to create new variables in those data.tables called status and csat.

What I would like to do is:

  1. Read in the data tables
  2. Recode the csat variable into a new variable
  3. Subset the data.table so it only includes 4 variables (csat,csat_d,id,and status)
  4. Merge the data.table with previous tables using an outer join (so it can be reshaped into long form)

I am trying to address points 1-3 in the script below, and have no idea how to implement #4.

EDITED:

df_names<-c(df,df2,df3)  # Create list of data.tables
csat_vars<-c("CustomerId","csat","csat_d","status") # Create list of 4 variables

out <- lapply(1:length(df_names), function(idx) {
  d <- df_names[idx]
  d$csat_d <- recode(d$csat,"1:5=0;6:7=1;NA=NA;")
  d <- subset(d, select=c(csat_vars))
})

I am agnostic about whether or not it's better to use data.table or data.frame (these are small datasets), so any help is welcome.

Mini-datasets here:

> dput(head(df))
structure(list(respid = c(1499L, 433L, 2600L, 2282L, 1503L, 3304L
), csat = c(4L, 6L, NA, NA, 6L, 4L), status = c("Active", "Active", 
"Active", "Active", "Active", "Active"), touch = c(2L, 3L, 2L, 
3L, 2L, 2L)), .Names = c("CustomerId", "csat", "status", "touch"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)

> dput(head(df2_r))
structure(list(respid = c(6L, 5L, 149L, 147L, 270L, 145L), csat = c(4L, 
NA, 6L, 7L, 7L, 4L), status = c("Active", "Lapsed/Churned", "Active", 
"Active", "Active", "Active"), touch = c(3L, NA, 3L, 1L, 3L, 
1L)), .Names = c("CustomerId", "csat", "status", "touch"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)

> dput(head(df3))
structure(list(respid = c(1713L, 1611L, 1630L, 1773L, 1391L, 
1571L), csat = c(4L, 6L, 4L, 5L, 7L, 4L), status = c("Active", 
"Active", "Active", "Active", "Active", "Active"), AGENCY_1 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), .Names = c("CustomerId", "csat", "status", "AGENCY_1"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7f800301b778>)
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
roody
  • 2,633
  • 5
  • 38
  • 50
  • Make a list of the actual `data.table`s, not their names and do you have `data.table`s or `data.frame`s? I aks because you are using rather `data.frame` nomenclature to handle `data.tables` , i.e. not using `:=` – Simon O'Hanlon Jan 30 '14 at 17:40
  • They are `data.tables` and due to ignorance I guess I'm using the wrong nomenclature :( Feel free to correct me! Changing to the list first. – roody Jan 30 '14 at 17:43

1 Answers1

0

At a guess I'd say you want to do this...

out <- lapply( ll , function(x) x[ , csat := recode( csat , ,"1:5=0;6:7=1;NA=NA;" ) ][ , csat_vars , with = FALSE ] )

And as a toy worked example I show this:

df1 <- data.table( a = 1 , b = 2 , c = 3 )
df2 <- data.table( a = 1 , b = 2 , c = 3 )
ll <- list(df1,df2) 
vars <- c( "a" , "c" )
#  Recode column 'c' to 10, and then subset data.table to only columns 'a' and 'c'
lapply( ll , function(x)  x[ , c := 10 ][ , vars , with = FALSE  ] )
#[[1]]
#   a  c
#1: 1 10

#[[2]]
#   a  c
#1: 1 10
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • This yields the following error message (which I'm having a tough time interpreting, as I'm not an expert in data.table): `Error in `:=`(csat, recode(csat, , "1:5=0;6:7=1;NA=NA;")) : := is defined for use in j only, and (currently) only once; i.e., DT[i,col:=1L] and DT[,newcol:=sum(colB),by=colA] are ok, but not DT[i,col]:=1L, not DT[i]$col:=1L and not DT[,{newcol1:=1L;newcol2:=2L}]. Please see help(":="). Check is.data.table(DT) is TRUE. ` – roody Jan 30 '14 at 18:35
  • 2
    @roody I think to get further help, make your example a reproducible one... Try reading [**how to make a great reproducible example**](http://stackoverflow.com/q/5963269/1478381) – Simon O'Hanlon Jan 30 '14 at 18:51
  • Thanks for linking to the tutorial! Super helpful, and now sample data added to the post above. – roody Jan 30 '14 at 21:06