2

I have a small data.table representing one record per test cell (AB testing results) and am wanting to add several more columns that compare each test cell, against each other test cell. In other words, the number of columns I want to add, will depend upon how many test cells are in the AB test in question.

My data.table looks like:

Group   Delta     SD.diff
Control     0           0
Cell1 0.00200 0.001096139
Cell2 0.00196 0.001095797
Cell3 0.00210 0.001096992
Cell4 0.00160 0.001092716

And I want to add the following columns (numbers are trash here):

Group v.Cell1    v.Cell2   v.Cell3   v.Cell4
Control  0.45       0.41      0.45      0.41 
Cell1    0.50       0.58      0.48      0.66
Cell2    0.58       0.50      0.58      0.48
Cell3    0.48       0.58      0.50      0.70
Cell4    0.66       0.48      0.70      0.50

I am sure that do.call is the way to go, but I cant work out how to embed one do.call inside another to generate the script... and I can't work out how to then execute the scripts (20 lines in total). The closest I am currently is:

a <- do.call("paste",c("test.1.results <- mutate(test.1.results, P.Better.",list(unlist(test.1.results[,Group]))," = pnorm(Delta, test.1.results['",list(unlist(test.1.results[,Group])),"'][,Delta], SD.diff,lower.tail=TRUE))", sep=""))

Which produces 5 script lines like:

test.1.results <- mutate(test.1.results, P.Better.Cell2 = pnorm(Delta, test.1.results['Cell2'][,Delta], SD.diff,lower.tail=TRUE))

Which only compares one test cell results against itself.. a 0.50 result (difference due to chance). No use what so ever as I need each test compared to each other.

Not sure where to go with this one.

Romain Francois
  • 17,432
  • 3
  • 51
  • 77
Andrew Dempsey
  • 190
  • 1
  • 12
  • Given you using the package data.table why are you using `mutate`? This is not using any of data.table's power. – mnel Dec 13 '12 at 21:58
  • I am forcing myself to learn data.table, it is not ideal in this scenario, but it behaves like a data frame most of the time. If I need to back to trustworthy data frames I will – Andrew Dempsey Dec 13 '12 at 22:07
  • @AndrewDempsey Including the adjective trustworthy in that context isn't likely to keep me on side and motivated to help, btw. – Matt Dowle Dec 13 '12 at 22:40
  • "trustworthy" in the context that I am more familiar with data.frames. My sum total exposure to data.table is the code I am writing now which is taking longer as I figure out the syntax of filtering, selecting and grouping for data.table. Project deadlines mean I likely chose the wrong time to play around with data.table as speed is not really an issue – Andrew Dempsey Dec 13 '12 at 22:46
  • Then it's the wrong word. Seems like you meant familiar. Yes, sounds like this is the wrong time to learn data.table, especially if you don't really need it. – Matt Dowle Dec 13 '12 at 22:53
  • Agreed. might just retrofit DF – Andrew Dempsey Dec 13 '12 at 22:58

1 Answers1

3

Update: In v1.8.11, FR #2077 is now implemented - set() can now add columns by reference, . From NEWS:

set() is able to add new columns by reference now. For example, set(DT, i=3:5, j="bla", 5L) is equivalent to DT[3:5, bla := 5L]. This was FR #2077. Tests added.


Tasks like this are often easier with set(). To demonstrate, here's a translation of what you have in the question (untested). But I realise you want something different than what you've posted (which I don't quite understand, quickly).

for (i in paste0("Cell",1:4))
  set(DT,                   # the data.table to update/add column by reference
    i=NULL,                 # no row subset, NULL is default anyway
    j=paste("P.Better.",i), # column name or position. must be name when adding
    value = pnorm(DT$Delta, DT[i][,Delta], DT$SD.diff, lower.tail=TRUE)

Note that you can add only a subset of a new column and the rest will be filled with NA. Both with := and set.

Arun
  • 116,683
  • 26
  • 284
  • 387
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • Error, but easy enough for me to fix this one: Cannot add columns with set(), use := instead to add columns by reference – Andrew Dempsey Dec 13 '12 at 22:57
  • Actually, I am switching to data.frames as I am only half way through what I need to do. Should be quick to rework this half and then much quicker to do the next half – Andrew Dempsey Dec 13 '12 at 23:09
  • This looks very promising. I couldn't get to run without error: dput(DT) structure(list(Group = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("Cell1", "Cell2", "Cell3", "Cell4", "Control"), class = "factor"), Delta = c(0, 0.002, 0.00196, 0.0021, 0.0016), SD.diff = c(0, 0.001096139, 0.001095797, 0.001096992, 0.001092716)), .Names = c("Group", "Delta", "SD.diff"), row.names = c(NA, -5L), class = c("data.table", "data.frame"), .internal.selfref = ) – IRTFM Dec 13 '12 at 23:10
  • @Dwin and Andrew, Oops, indeed. Ok I've updated [FR2077 set() needs to be able to add new columns](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2077&group_id=240&atid=978) – Matt Dowle Dec 14 '12 at 01:28