Ifelse statement with nested data.table statements

Question

This is my data frame.

library(data.table)
     dt <- fread('
 Name     Video   Webinar Meeting Conference  Level  NextStep
  John       1         0        0       0      1     Webinar,Meeting,Conference
  John       1         1        0       0      1     Meeting,Conference
  John       1         1        1       0      2     Conference      
  Tom        0         0        1       0      1     Webinar,Conference,Video
  Tom        0         0        1       1      2     Webinar,Video   
  Kyle       0         0        0       1      2     Webinar,Meeting,Video

                                ')

I am creating the nextstep column by doing this

dt[, nextstep := paste0(names(.SD)[.SD==0], collapse = ','), 1:nrow(DT), .SDcols = 2:5][]

according to the solution here Making a character string with column names with zero values

Now I want to change the order of how the elements show up in the next step column based on the 'Level' field. For example, if it is level 1, I want conference to show up before Webinar & Meeting. If level 2, I want video to always show up last.This is my attempt.

 dt<-dt[, NextStep := ifelse(Level1=="Level0",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(5,2,3,4)),
      ifelse(EngagementLevel1=="Level2",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(3,4,5,2))))]

I am just trying to reorder the elements within 'nextstep' field based on the 'Level' field. Sincerely appreciate your help!

Frank · Accepted Answer · 2016-06-13T20:41:37.000

Well, you can put your preferred ordering somewhere:

levelmap = data.table(Level = 1:2, ord = list(
    c("Conference", "Webinar", "Meeting", "Video"), 
    c("Webinar", "Meeting", "Conference", "Video")
))

And then use your previous approach:

DT[, r := .I]
for (ii in seq(nrow(levelmap)))
    DT[ Level == levelmap$Level[ii], 
      ns := paste0(names(.SD)[.SD==0], collapse = ',')
    , by = r, .SDcols = levelmap$ord[[ii]] ][]

But really, I think you should not be doing this at all (what's in this question nor the previous one). It's a messy way to handle data.

Comment on tidy data. To clarify what I mean by that, I would suggest reviewing Hadley Wickham's paper on tidy data. Tidy data here would probably look like this:

myDT = melt(
  DT[, !"NextStep", with=FALSE][, Seq := 1:.N, by=Name], 
  id.var = c("Name", "Seq", "Level"))

    Name Seq Level   variable value
 1: John   1     1      Video     1
 2: John   2     1      Video     1
 3: John   3     2      Video     1
 4:  Tom   1     1      Video     0
 5:  Tom   2     2      Video     0
 6: Kyle   1     2      Video     0
 7: John   1     1    Webinar     0
 8: John   2     1    Webinar     1
 9: John   3     2    Webinar     1
10:  Tom   1     1    Webinar     0
11:  Tom   2     2    Webinar     0
12: Kyle   1     2    Webinar     0
13: John   1     1    Meeting     0
14: John   2     1    Meeting     0
15: John   3     2    Meeting     1
16:  Tom   1     1    Meeting     1
17:  Tom   2     2    Meeting     1
18: Kyle   1     2    Meeting     0
19: John   1     1 Conference     0
20: John   2     1 Conference     0
21: John   3     2 Conference     0
22:  Tom   1     1 Conference     0
23:  Tom   2     2 Conference     1
24: Kyle   1     2 Conference     1
    Name Seq Level   variable value

Or maybe you would even drop all the rows that are zeros or that are ones (since they are fairly redundant).

The idea is that this would be your main data that you use to do any analysis or build any summary tables. In your case, the goal is a summary table (as far as I can tell), like

library(magrittr)
res = myDT[levelmap, on="Level"][, .( NextStep = 
  variable[value == 0] %>% factor(levels = ord[[1]]) %>% sort %>% toString
), keyby=.(Name, Seq, Level)]

   Name Seq Level                     NextStep
1: John   1     1 Conference, Webinar, Meeting
2: John   2     1          Conference, Meeting
3: John   3     2                   Conference
4: Kyle   1     2      Webinar, Meeting, Video
5:  Tom   1     1   Conference, Webinar, Video
6:  Tom   2     2               Webinar, Video

If you really wanted the 0/1 columns, you could also include them with dcast (which transforms data from long to wide):

cbind(
  res, 
  dcast(myDT, Name + Seq ~ variable, value.var="value")[, !c("Name", "Seq"), with=FALSE])

Thanks a lot for you answer Frank.Kindly let me know what you are referring to when you are saying it is a messy way to handle data. Do you mean the way I am generating the nextstep column? — gibbz00, Jun 13 '16 at 20:14
@gibbz00 Basically, I mean that you should not perform analysis with data in wide format (where you have "data" stored as column names). — Frank, Jun 13 '16 at 20:43
Thank you so much for the detailed explanation! I have learnt so much from you. — gibbz00, Jun 14 '16 at 13:55

Ifelse statement with nested data.table statements

1 Answers1

Linked