I am wondering what the most memory efficient way to initialize a list is in R if that list is going to be used in a loop to store results. I know that growing an object in a loop can cause a serious hit in computational efficiency so I am trying to avoid that as much as possible.
My problem is as follows. I have several groups of data that I want to process individually. The gist of my code is I have a loop that runs through each group one at a time, does some t-tests, and then returns only the statistically significant results (thus variable length results for each group). So far I am initializing a list of length(groups)
to store the results of each iteration.
My main question is how I should be initializing the list so that the object is not grown in the loop.
- Is it good enough to do
list = vector(mode = "list", length=length(groups))
for the initialization?- I am skeptical about this because it just creates a list of
length(groups)
but each entry is equal toNULL
. My concern is that during each iteration of the loop when I go to store data into the list, it is going to recopy the object each time as the entry goes fromNULL
to my results vector, in which case initializing the list doesn't really do much good. I don't know how the internals of alist
work, however, so it is possible that it just stores the reference to the vector being stored in the list, meaning recopying is not necessary.
- I am skeptical about this because it just creates a list of
- The other option would be to initialize each element of the list to a vector of the maximum possible length the results could have.
- This is not a big issue as the maximum number of possible valid results is known. If I took this approach I would just overwrite each vector with the results vector within the loop. Since the maximum amount of memory would already be reserved hopefully no recopying/growth would occur. I don't want to take this approach, however, if it is not necessary and the first option above is good enough.
Below is some psuedo code describing my problem
#initialize variables
results = vector(mode="list", length=length(groups)) #the line of code in question
y=1
tTests = vector(length = length(singleGroup))
#perform analysis on each group in groups
for(group in groups)
{
#returns a vector of p values with one entry per element in group
tTests = tTestFunction(group)
results[[y]] = tTests<=0.05
y=y+1
}