2

First of all I wanna say that I have no clue about R and coding itself. I just have to do a regression with clustered standard errors for my bachelor thesis and I can't do that in Excel. I managed to do the linear regression with clustered standard errors, but the multiple regression (even without clustering) gives me the error message: cannot allocate vector of size 4.7gb. I have a 64-bit windows7 version running on my PC with 8gb RAM availabe. Those 8gb are also considered by R.
> memory.limit() [1] 8168

this is the function I use and the error message R spits out:

mregt=biglm(GAAP.ETR~TIME+ADVERTISING.EXPENSE+INTANGIBLE.ASSETS+LEVERAGE+LOG.ASSETS+PP.E+R.D.EXPENSE+SPECIAL.ITEMS,data=Control.Variables)
Error: cannot allocate vector of size 4.7 Gb
In addition: Warning messages:
1: In array(c(rep.int(c(1, numeric(n)), n - 1L), 1), d, dn) :
  Reached total allocation of 8168Mb: see help(memory.size)
2: In array(c(rep.int(c(1, numeric(n)), n - 1L), 1), d, dn) :
  Reached total allocation of 8168Mb: see help(memory.size)
3: In array(c(rep.int(c(1, numeric(n)), n - 1L), 1), d, dn) :
  Reached total allocation of 8168Mb: see help(memory.size)
4: In array(c(rep.int(c(1, numeric(n)), n - 1L), 1), d, dn) :
  Reached total allocation of 8168Mb: see help(memory.size)

As you can see in the function I already try to use the big.memory package, either I'm doing it wrong (very likely) or it just doesn't work out.

The database I'm using has 38104 observations with 10 columns => 38104*10

The function that I used for clustering the simple regression is this:

function(dat,fm, cluster1, cluster2){
attach(dat, warn.conflicts = F)
library(sandwich);library(lmtest)
cluster12 = paste(cluster1,cluster2, sep="")
M1  <- length(unique(cluster1))
M2  <- length(unique(cluster2))   
M12 <- length(unique(cluster12))
N   <- length(cluster1)          
K   <- fm$rank             
dfc1  <- (M1/(M1-1))*((N-1)/(N-K))  
dfc2  <- (M2/(M2-1))*((N-1)/(N-K))  
dfc12 <- (M12/(M12-1))*((N-1)/(N-K))  
u1j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster1,  sum)) 
u2j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster2,  sum)) 
u12j  <- apply(estfun(fm), 2, function(x) tapply(x, cluster12, sum)) 
vc1   <-  dfc1*sandwich(fm, meat=crossprod(u1j)/N )
vc2   <-  dfc2*sandwich(fm, meat=crossprod(u2j)/N )
vc12  <- dfc12*sandwich(fm, meat=crossprod(u12j)/N)
vcovMCL <- vc1 + vc2 - vc12
coeftest(fm, vcovMCL)}

With me then using:

mcl(All,regt,All$Company.Name,All$Data.Year...Fiscal)

I checked some posts here and on other sites. Tried a couple things but it just gives me the same error message. Again, I really have no clue about R and coding so I really need the simplest way to do this :D

Jaap
  • 81,064
  • 34
  • 182
  • 193
Copiloc
  • 23
  • 5
  • So fitting a model throws an error. How does clustering come into play in all this? – Roman Luštrik Mar 18 '16 at 12:04
  • I haven't even come to the clustering function for the multiple regression because the multiple regression (without clustering) gave me the error. I just thought I'd share the next step on my list so that I avoid having the first problem fixed but then the clustering gives me another error. – Copiloc Mar 18 '16 at 12:10
  • You are literally running out of memory, obviously. – Has QUIT--Anony-Mousse Mar 18 '16 at 16:39
  • Well I figured that out. Obviously I wanna solve this problem though – Copiloc Mar 18 '16 at 17:10
  • Prove that it works on a reduced data set first and see what sort of memory it requires then extrapolate to see what you need. This might tell you if this is even solvable as is or if you need to try something very different. – Harry Mar 18 '16 at 19:56
  • Well as I said I did the same thing, a regression with just 1 x and 1 y variable, and it worked fine. How can I see the memory required to run that? – Copiloc Mar 18 '16 at 21:12

0 Answers0