0

I have 3 tables as

tbl.1 <- data.table("A" = runif(5), "B" = runif(5))
tbl.2 <- data.table("A" = runif(5), "B" = runif(5))
tbl.3 <- data.table("A" = runif(5), "B" = runif(5))

I would like to iterate through the tables with a loop such as

for (i in 1:3) {
  # Open tbl.i
  # Do something
}

How can this be done? I can put the tables on a list an iterate through the list which works OK. However, I am trying to keep the tables as unique objects for various reasons. Thanks.

LDBerriz
  • 385
  • 4
  • 15
  • 1
    Please be specific as to what you want to do, instead of `Do something`. Also, if you can use `set.seed` and expected output, the answers can be specific. – akrun Jan 14 '16 at 18:09
  • 1
    Like rawr said: don't do this. Rather `my_tbl = rbind(tbl.1,tbl.2,tbl.3, idcol=TRUE); my_tbl[,{ ... do stuff ... }, by=.id]` assuming the operations on each table within the loop are independent. – Frank Jan 14 '16 at 18:18
  • R is not set up to do this well, R is set up to work with data frames in lists. That's why there's no good/easy answer to the question you're asking. Why is it you need to have unique objects? I'd be surprised if that were really the case. – Señor O Jan 14 '16 at 18:20
  • 1
    'Do something' means that what goes in that part of the loop is not part of the problem I am trying to solve. In my case I am adding a new column "C" based on the value of "A" + "B". – LDBerriz Jan 14 '16 at 18:25

4 Answers4

3

If you don't want to keep data.tables in a list. You can refer to them in your environment. In this example it is a global environment. If your data.tables will be populated inside some other package then you would need to change the environment.

library(data.table)
tbl.1 <- data.table("A" = runif(5), "B" = runif(5))
tbl.2 <- data.table("A" = runif(5), "B" = runif(5))
tbl.3 <- data.table("A" = runif(5), "B" = runif(5))
for (i in paste0("tbl.",1:3)) {
    # Open tbl.i: get
    # Do something: str
    str(get(i, envir = .GlobalEnv))
}
jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • Indeed! and if you wanted to give them their own environment, these are created with `new.env()` – SJWard Jan 14 '16 at 18:30
  • 1
    Additionally, this might not behave as you expect if you modify things: http://stackoverflow.com/questions/31120295/r-variable-names-in-loop-get-etc/31122283#31122283 – SJWard Jan 14 '16 at 18:32
  • @Ward9250 Good point. With data.table is somehow easier as you usually want to modify by reference in place, otherwise `copy()` can be always used. – jangorecki Jan 14 '16 at 19:11
1

LDBerriz,

I believe it is possible to do what you are trying to do by looping through variable names and getting them from .GlobalEnv, which represents the workspace.

However, I suggest, as several other commenters have, it's far easier to store your tables in a list, and loop over the list, than it is to loop over variables in .GlobalEnv:

tbl.1 <- data.table("A" = runif(5), "B" = runif(5))
tbl.2 <- data.table("A" = runif(5), "B" = runif(5))
tbl.3 <- data.table("A" = runif(5), "B" = runif(5))

tblList <- list(tbl.1, tbl.2, tbl.3)

for (i in 1:3) {
  tbl <- tblList[[i]]
  # Do something with tbl.
}

For the sake of this answer, I assume that the tables are actually different, or there is some reason you have, that they needs to be separate tables. Of course if the columns of the tables were all the same sort of data/variables, as tbl.1, tbl.2, and tbl.3 in your example are, then you could just combine them into one table and do stuff to the one table:

masterTbl <- rbind(tbl.1,tbl.2,tbl.3)

You could even add a column to them so you can identify which table they originally came from, should you need to:

tbl.1$from <- 1
tbl.2$from <- 2
tbl.3$from <- 3

masterTbl <- rbind(tbl.1,tbl.2,tbl.3)

Best, Ben.

SJWard
  • 3,629
  • 5
  • 39
  • 54
0

As others have already indicated, this doesn't seem to be the "data.table" way of doing things, and since you have not been very clear about what you are doing when you say "do something", it's hard to make a good recommendation.

That said, a for loop could be fine if your "do something" is all about assignment by reference (for instance, using set or :=).

That could be done with a simple:

tbl.1 <- data.table("A" = runif(5), "B" = runif(5))
tbl.2 <- data.table("A" = runif(5), "B" = runif(5))
tbl.3 <- data.table("A" = runif(5), "B" = runif(5))

x <- ls(pattern = "tbl")

for (i in seq_along(x)) {
  get(x[i])[, C := A + B]
}

tbl.2

If you're not dealing with something that would be solved with assignment by reference, for instance you are subsetting or summarizing your data and want to replace the original data.table, then you'll need to use get and assign. (Ugh.)

tbl.1 <- data.table("A" = runif(5), "B" = runif(5))
tbl.2 <- data.table("A" = runif(5), "B" = runif(5))
tbl.3 <- data.table("A" = runif(5), "B" = runif(5))

x <- ls(pattern = "tbl")

for (i in seq_along(x)) {
  assign(x[i], get(x[i])[1, ])
}
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I agree this is not the 'data.table' way of doing things. The only reason to keep the tables separate is that each table is quite big and combining them in a list creates an very large object which consumes RAM. Thanks everybody for the good answers. – LDBerriz Jan 14 '16 at 19:01
0

Alternatively, one could just use the ls() command in connection with a pattern, so that one just directly selects the desired tables. Found that to be a little easier and more versatile. I also had the issue that the combined data.tables would be too huge, so I had to resort to split them up and thus accessing them separately.

 for (tbl in ls(pattern = glob2rx("tbl.*"))) {
    str(get(tbl))
 }
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
hannes101
  • 2,410
  • 1
  • 17
  • 40