I suggest you write one function that does all of which you have in your for
loop and iterate over a list of tables more simply (at least more readable).
I think I understand your motivation for working in side-effect, assigning each table to its own variable, so I'll stick with that (but offer an alternative anyway). I'll also stick with your use of dplyr
verbs and just go all-in with pipes (for readability).
library(dplyr)
func <- function(tblnm, con, envir = NULL) {
now <- Sys.time()
yearseconds <- (365*24*60*60)
dat <- data.frame(tbl(con, tblnm)) %>%
mutate(exclDate = uploaded_at + yearseconds) %>%
filter(!(exclDate > now & !(private %in% c(0, NA))) & excluded %in% c(0, NA)) %>%
select(-exclDate, -excluded, -export_time, -private)
if (!is.null(envir)) assign(tblnm, dat, envir = envir)
invisible(dat)
}
With this, you can replace much of above with
for (i in seq_along(tablesExport)) {
func(tablesExport[i], con = pool, envir = .GlobalEnv)
}
Note that I was explicit about passing both the connection object (your pool
) and the envir
onment in which you wanted the side-effect. I did this to be more declarative, as well as supporting perhaps other uses (within non-global environments, who knows).
One thing that that supports is operating without side-effect, working with a list of frames that I mentioned in my comment. Instead of the for
loop and assigning each table to its own variable, you can then do:
dat <- lapply(setNames(nm=tablesExport), func, con = pool, envir = NULL)
The envir=NULL
is not strictly needed because the default in the function definition is that, but I did it here to be perfectly clear: I don't want side-effect. Using this dat
, you would then use dat[[ tableExports[3] ]]
or (if one table is named UserList
) dat$UserList
to access individual tables.
A couple of notes about some of my style choices:
- I try to not use
1:length(...)
; while it works more of the time, if length(.)
is ever 0
, one would hope that the for
loop will do nothing, but unfortunately it sees 1:0
which resolves to c(1, 0)
, not an empty vector. seq_along(x)
is equivalent to seq_len(length(x))
is equivalent to 1:length(x)
except when x
is empty.
- I used
lapply(setNames(nm=tablesExports),...)
since the default action of lapply
is to name its elements on output only if the object is named going in. While sapply
does name things intuitively (it substitutes/deparses the vector/list), it can also return a vector or a list, which can be frustrating for pure programmatic use. I prefer lapply
for that reason. Alternatively, I could have used sapply(tableExports, func, con=pool, envir=NULL, USE.NAMES=FALSE)
for the same effect.
- I defined
now
and yearseconds
mostly because I try to resist hard-coded constants in code without clear definition. Naming them something meaningful is useful and declarative. (I've had to debug modules that used for instance 10
in various places throughout the code ... some of them meant the same constant, some meant something different, so a global-search-replace to update one of the constants failed.)
- I used
invisible
for the option of using it with side-effect; without invisible
, then func(somename,...)
would dump a frame to the console, which may not be what you want. invisible
allows you to return
a value, but if the user does not explicitly capture it into an object, then don't pummel the console with a frame rendering. When used as somevar <- func(...)
, it does absolutely nothing.
- One last thing ... a more defensive version of
func
would check for the existence of those column names before arbitrarily filtering on them. One would then either (a) fail immediately, or (b) only filter on a column if the column is present. While (b) could still produce "interesting" effects in down-stream code, it would still allow the down-stream code to execute.