34

In data.table is possible to have columns of type list and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.

Instead of using long strings with some weird, unusual character to separate each message from the others (like |), and a ; to separate each item in a comment, I thought to use lists like this:

library(data.table)
dt <- data.table(id=1:2,
        comment=list(list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world")),
          list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world"))))

> dt
   id comment
1:  1  <list>
2:  2  <list>

to store all the comments added for one particular row. (also because it will be easier to convert to JSON later on when I need to send it back to the UI)

However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R either crashes or doesn't assign what I would like and then crashes:

library(data.table)

> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL

[[2]]
NULL

> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1

[[2]]
NULL

> set(dt, 1L, "comment", list(1, "a"))  # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
  Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)

> dt[1L, comment := list(1, "a")]       # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two

I know I'm trying to misuse data.table, e.g. the way the j argument has been designed allows this:

dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one

Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment out in a variable, modify it, and then re-assign the whole column every times I need to do an update?

smci
  • 32,567
  • 20
  • 113
  • 146
Michele
  • 8,563
  • 6
  • 45
  • 72
  • You could probably use `rbind` and/or `merge` to succesively update your `data.table`, but that sounds very inefficient. Other than that I can only say that I ran into the following warning message: "Column 'comment' is type 'list' which is not supported as a key column type, currently." – shadow Mar 20 '14 at 13:27
  • `dt[1L, comment := list(1L)]` - you've to use `list(.)` as the column type is `list`. `set(dt, 1, "comment", list(1))` - `list(1, "a")` is of length 2, and you're assigning it to `i=1` (which is of length 1. – Arun Mar 20 '14 at 14:05
  • @arun can you please write an answer with a reproducible code? because I don't I understood. Just to clarify: I need something like `list(1, "a")` inside 1 cell of the table, precisely in an element of a column of type `list` as defined above – Michele Mar 20 '14 at 14:23
  • @shadow I think you need to update `data.table` – Michele Mar 20 '14 at 14:24
  • @Arun `I don't I understood` means `I think I haven't understood` sorry... I also tried to assign `list(list(1, "a"))`, which is of length 1, but `R` still crashes. – Michele Mar 20 '14 at 14:29

2 Answers2

36

Using :=:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]

# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]

# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]

For the last case, you'll need one more list because data.table uses list(.) to look for values to assign to columns by reference.

Using set:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)

# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))

# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))

HTH


I'm using the current development version 1.9.3, but should just work fine on any other version.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.0.99  reshape2_1.2.2 stringr_0.6.2  tools_3.0.3   
Arun
  • 116,683
  • 26
  • 284
  • 387
  • 1
    Thanks a million, I was missing one `list`, I tried only with 2, not 3. I was misled by `dt[1, comment := 1]` inserting `1` inside the first list element. So I thought `:= list(list(1, "a"))` should put `list(1, "a")` inside the first list element. One question Arun: why `dt[1, comment := list(list(1))]` and `dt[1, comment := 1]` gives the same result? – Michele Mar 20 '14 at 15:06
  • @Michele, Great question! I think it shouldn't. It should give a type mismatch error, IIUC. I'm not sure if I'd call it a bug, but it's still an inconsistency. So, could you please file a bug report [**here**](https://r-forge.r-project.org/tracker/?atid=975&group_id=240&func=browse)? – Arun Mar 20 '14 at 15:13
  • 1
    awesome. `:= list(list(1, "a")) ` solved my problem. Great answer. Thanks! – Paul 'Joey' McMurdie Jun 05 '15 at 23:28
  • @Arun, I'm using data.table 1.9.7 and got warnings of type coercion for several example code in `:=`. And how can I assign a vector to a cell? `dt[, comment := list(c("a", "b"), 1)]` work for all rows, but assigning one cell with `dt[1, comment := list(c("a", "b"))]` or `dt[1, comment := c("a", "b")]` doesn't even give right result. – dracodoc Sep 21 '16 at 16:56
  • OK, the problem is probably with that we need [[]] to access the list item. the regular method of `dt$comment[[1]] <- c("a", "b")` works. Though I still don't know how to do it in data.table `j` syntax. – dracodoc Sep 21 '16 at 17:02
  • Use `list(list(...))`. The first list is for the syntax. The second one is for your column. See the reference semantics vignette. – Arun Sep 21 '16 at 18:12
  • @Arun, I knew the extra list requirement, but I'm trying to use a column of list which hold a vector in each cell, just like Matt's example below. `dt[, comment := list(c("a", "b"), 1)]` is working as I expected, which put a vector in first row, but I'm trying to assign one cell at a time, not for whole column, while `dt[1, comment := list(c("a", "b"))]` doesn't work (it has warning, and the result is not right). I later found out `dt$comment[[1]] <- c("a", "b")` works, but I'm wondering if there is a syntax inside `j` that can work. – dracodoc Sep 22 '16 at 00:45
19

Just to add more info, what list columns are really designed for is when each cell is itself a vector:

> DT = data.table(a=1:2, b=list(1:5,1:10))
> DT
   a            b
1: 1    1,2,3,4,5
2: 2 1,2,3,4,5,6,

> sapply(DT$b, length)
[1]  5 10 

Notice the pretty printing of the vectors in the b column. Those commas are just for display, each cell is actually a vector (as shown by the sapply command above). Note also the trailing comma on the 2nd item of b. That indicates that the vector is longer than displayed (data.table just displays the first 6 items).

Or, more like your example :

> DT = data.table(id=1:2, comment=list( c("michele", Sys.time(), "hello"),
                                        c("michele", Sys.time(), "world") ))
> DT
   id                       comment
1:  1 michele,1395330180.9278,hello
2:  2 michele,1395330180.9281,world 

What you're trying to do is not only have a list column, but put list into each cell as well, which is why <list> is being displayed. Additionally if you place named lists into each cell then beware that all those names will use up space. Where possible, a list column of vectors may be easier.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • Thanks a lot for the hint. I think I'll be fine however. I'm not using data.table for its speed (this time). – Michele Mar 20 '14 at 16:20