4

I have a character vector x which I want to turn into a one row data.table in a speedy way. The command data.table(x) returns a one column data.table. Now, data.table(t(x)) gets the job done but I'm wondering if there's a faster way.

NewNameStat
  • 2,474
  • 1
  • 19
  • 26
  • 1
    The real question is: why do you want to do this? It makes little sense to have a table of just one row where each element is of the same kind. Are you sure about that? A standard `character` vector (or even a matrix with just one row) seems perfect for this kind of object. – nicola Mar 25 '16 at 14:08
  • @nicola It's for rbinding purposes. I have a list of different sized vectors which I want to convert into a data.table to analyze it. – NewNameStat Mar 25 '16 at 14:23
  • 1
    If that's the case, you should `rbind` before and then convert to a `data.table`. Converting each row to a `data.table` and `rbind`ing them after is worse. – nicola Mar 25 '16 at 14:24
  • 1
    @nicola If `y` is my list of character vectors. `rbindlist(lapply(y, as.list))` does seem to be a lot faster! Now I'm getting a bit greedy. I still have to loop through y - is there an even faster way? `rbindlist` is now a new function in my lexicon - is there some other function that can handle this without first converting everything to a list? – NewNameStat Mar 25 '16 at 14:49
  • I suggest you to ask another question, stating exactly which are your inputs and the desired output. Can't tell what's better for you at this stage. – nicola Mar 25 '16 at 15:06
  • See [#1244](https://github.com/Rdatatable/data.table/issues/1244) which would expand `setDT` to this case. Also see [this](http://stackoverflow.com/questions/30344192/sub-assign-by-reference-on-vector-in-r) question of jangorecki's / corresponding answer from eddi – MichaelChirico Mar 25 '16 at 15:33

1 Answers1

8

We could use

x <- 1:5
setDT(as.list(x))[]

Benchmarks

v1 <- 1:1e5
system.time(data.table(t(v1)))
#   user  system elapsed 
#  12.95    0.01   12.97 
system.time(setDT(as.list(v1)))
#  user  system elapsed 
#   5.75    0.00    5.75 

system.time(as.data.table(t(v1)))
#    user  system elapsed 
#   6.35    0.00    6.34 

Update

If the above exercise it to rbind a vector with a data.table, we dont need to convert the vector to data.table

 d1 <- data.table(V1= 1:3, V2= 4:6, V3=7:9)
 rbindlist(list(d1, as.list(1:3)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 4
    Also `as.data.table(t(v1))` seems faster (and similar to setDT) – digEmAll Mar 25 '16 at 14:10
  • Very interesting! I have a couple questions: 1) My understanding of the difference between setDT() and data.table() has to do with memory allocation. Why does setDT(as.list(x)) and as.data.table(as.list(x)) return different things? 2) What does [] do at the end of setDT? – NewNameStat Mar 25 '16 at 14:22
  • 1
    @NewNameStat What do you mean by different things? THey return same 5 columns (based on the example). The `[]` is to print the output – akrun Mar 25 '16 at 14:24
  • @akrun As you have it in your answer, they're all returning the same thing. Setting list_x = as.list(x), I'm wondering why setDT(list_x) and as.data.table(list_x) don't return the same thing. The former returns a one row data.table and the later returns a one column data.table. – NewNameStat Mar 25 '16 at 14:28
  • 2
    @NewNameStat Again, it returns the same i.e. one row data.table for me. `dim(as.data.table(list_x)) #[1] 1 5 dim(setDT(list_x)) #[1] 1 5` Based on the comments to nicola, i think you don't need to convert to `data.table`. Just use it as a `list` and then place it in the `rbindlist` – akrun Mar 25 '16 at 14:30