Im trying to manipulate a large data table (~37 MB) but in a special way: for other (unrelated) reasons I have implemented a 'hook' like structure meaning that the overall process is like
1) load the data.table from disk
2) fire a certain hook
3) the hook structure looks for this name ans checks whether the user (=me :)) has bound a function to this hook and if so, it is called
4) the data is processed further
The functions look like this:
data = readRDS(pathToFile)
data = data.table(data)
fireHook("After_data_read", data, [some other parameters])
some_more_processing(data)
and the region around fireHook looks like
hooksRegistered = list(
"After_data_read" = function(data, ...) {
# do some stuff
}
)
fireHook = function(hookName, ...) {
for (hookNameRegistered in names(hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, ...)
}
}
}
Observe that one needs to cast an object that already is a data.table into it again (otherwise the pass-by-reference does not work), see Adding new columns to a data.table by-reference within a function not always working and Pass by reference bug?
Problem: this line: func(hookName, ...)
takes like forever (> 5 minutes).
The debugger never really gets into the function (so its not the code in the function that takes a long time) and I've tested it with small data.tables and it worked. Also, I noted that the following seems to work:
fireHook = function(hookName, ...) {
args = list(...)
for (hookNameRegistered in names(.global.hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, args)
}
}
}
(notice that I substituted ...
by list(...)
). To me, it seems as if R is trying to copy the whole table when using ...
. Is this right/desired? Or am I using it wrong?
regards,
FW