0

I want to estimate an SUR (Seemingly Unrelated Regressions) model.

I tried using systemfit and its wrapper Zelig. But I am not able to understand how to specify factors to be projected out (i.e., add fixed effects) and cluster the standard errors, like we do in felm().

Also, if I simply add the fixed effect variables to my regression equations, then I get the following error:

Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)

Thank you so much for your help!

I am adding a data sample from my data:

Y_var1 <- c(0.45, 0.40, 0.30, 0.40, 0.15, 0.35, 0.50, 0.55, 0.10, 0.15, 0.30, 0.10)
Y_var2 <- c(0.40, 0.25, 0.45, 0.30, 0.35, 0.25, 0.15, 0.25, 0.35, 0.30, 0.20, 0.15)
X_var1 <- c(0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
X_var2 <- c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0)
X_var3 <- c(0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1)
X_var4 <- c(0.18, 0.18, 0.18, 0.20, 0.20, 0.20, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24)
X_var5 <- c(0.08, 0.08, 0.08, 0.06, 0.06, 0.06, 0.04, 0.04, 0.04, 0.02, 0.02, 0.02)
X_var6 <- c(-0.25, -0.25, -0.25, 1.30, 1.30, 1.30, 1.80, 1.80, 1.80, 2.25, 2.25, 2.25)
X_var7 <- c(1000, 1000, 1000, 1500, 1500, 1500, 2000, 2000, 2000, 2500, 2500, 2500)
X_var8 <- c('ABC', 'ABC', 'ABC', 'MNO', 'MNO', 'MNO', 'DEF', 'DEF', 'DEF', 'XYZ', 'XYZ', 'XYZ')
X_var9 <- c(2000, 2010, 2020, 2000, 2010, 2020, 2000, 2010, 2020, 2000, 2010, 2020)

sample_data <- data.frame(Y_var1, Y_var2, X_var1, X_var2, X_var3, X_var4, X_var5, X_var6, X_var7, X_var8, X_var9)

library(systemfit)
formula <- list(mu1 = Y_var1 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7), 
                mu2 = Y_var2 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7))

fitsur <- systemfit(formula = formula, data=sample_data, method = "SUR")
fitols <- systemfit(formula = formula, data=sample_data, method = "OLS")

(Since this is a sample dataset, thus, the above two regressions will give an error I have mentioned above, but are working fine on my actual data.)

However, what I am interested in is estimating the above formula using SUR, with X_var8 and X_var9 fixed effects and standard errors clustered at X_var8 level.

If we use felm(), the specification is

felm(mu1 = Y_var1 ~ X_var1*X_var3 + X_var2*X_var3 + X_var4 + X_var5 + X_var6 + log(X_var7) | X_var8 + X_var9 | 0 | X_var8)

However, as my standard errors are correlated across equations, I need to use SUR.

Any help would be much appreciated. Thank You!

Anisha Garg
  • 53
  • 5
  • 10
  • it could be more helpful if theres any minimal example of how you estimate a SUR model and define the problem from the output of examples – Jovan Jul 27 '21 at 11:33
  • So, in SUR (Seemingly Unrelated Regressions), we estimate multiple regressions equations together, where the error term is correlated across equations. The command used is: systemfit(formula, data, method = "sur"), where formula is a list of all the regression equations, and data is the dataframe that we want to use. – Anisha Garg Jul 27 '21 at 13:03
  • Yes I mean, you can add the full command start from when you import the data to R, make a SUR Model, until you get the final error – Jovan Jul 27 '21 at 13:13
  • I have added an example, as I cannot share the actual data due to privacy concerns. I hope my issue is now clear. Thank you so much! (P.S.: Happy to connect over email to discuss the issue in detail.) – Anisha Garg Jul 27 '21 at 14:47
  • I tried to test your examples, and I got the culprit that the interaction formula (the one with X_var1*X_var3 + X_var2*X_var3) seems not really compatible with SUR and even OLS (it worked fine when I changed it to X_var1 + X_var2) with no interactions – Jovan Jul 27 '21 at 16:13
  • Is there something you want to achieve by adding interaction formula? looking at the value of var1, var2, and var3 it seems these are boolean types and i think this is best calculated directly to new variable, so then you can add the new variable to the formula to prevent the error – Jovan Jul 27 '21 at 16:28
  • Oh yes! I just checked that this works for the sample data. Thank you! However, I was able to estimate the equations using my actual data (this issue might be due to the sample data). However, the issue I am facing is about adding fixed effects and clustered standard errors to the estimation. Can you please help me with that? – Anisha Garg Jul 27 '21 at 16:34
  • In brief, I want to use the felm() specification in SUR estimation. – Anisha Garg Jul 27 '21 at 16:38
  • Hmm, actually I hadn't got any experience using SUR estimation recently, but looking at the core problem, the felm() function is derived at another package called lfe right? and systemfit() function is derived at package called systemfit, and looking at the documentation carefully, I guess SUR Model in systemfit() function didn't really support modeling with fixed effects compared to felm() which supports fixed effects – Jovan Jul 27 '21 at 16:49
  • SUR Model in systemfit() does support numerical model as i can see far, or can you give me one example that theres exist some example a SUR Model with fixed effects? so I may rethink the problem, in meanwhile I will try to look more again tomorrow – Jovan Jul 27 '21 at 16:55
  • That is the issue - I could not find anything. So, I thought about using lm() specification, where we add fixed effects in the regression equation itself. But that is giving me the error specified above. So, I tried again by removing the interaction term. Still no luck! As for clustered standard errors, I think summary(fitsur, cluster = c("X_var8")) might work. Thanks again for all your effort! :) It really means a lot. Hope you are able to find something – Anisha Garg Jul 27 '21 at 16:59
  • I guess the parameter cluster = c("X_Var8") kinda misleading, it doesnt really give any effects, since the results of summary(fitsur) is the same as summary(fitsur, cluster = c("X_var8")) – Jovan Jul 27 '21 at 17:12
  • Yes i am happy to help :), I will see later if there some solution match near your expectation – Jovan Jul 27 '21 at 17:13
  • Thanks a lot! :) If you find anything in Stata or Python, that would also work. :) – Anisha Garg Jul 28 '21 at 04:58
  • I already post an answer, is it match with your issues? – Jovan Jul 28 '21 at 07:51

1 Answers1

0

I think now I get it how to implement Fixed Effect correctly to SUR Model,

  1. we need to transform the X_var8 to numeric first with one hot encoding, and also I make new variable based by your interaction formula above

    library(mltools)

    sample_data2 <- as.data.frame(one_hot(as.data.table(sample_data)))

    sample_data2$X_var13 <- sample_data2$X_var1 * sample_data2$X_var3

    sample_data2$X_var23 <- sample_data2$X_var2 * sample_data2$X_var3

  2. Check Closely the value of sample_data2$X_var13, and sample_data2$X_var23

    sample_data2$X_var13

    [1] 0 0 0 0 0 0 0 0 0 0 0 0

    sample_data2$X_var23

    [1] 0 0 0 0 0 1 0 0 0 0 0 0

Since for the desired sample data all sample_data2$X_var13 is 0, it will also effecting an error of Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) since it doesn't have any meaningful value, we can discard it, but feel free to use it to real data

  1. Make Formula with added fixed effects:

    formula <- list(mu1 = Y_var1 ~ X_var23 + X_var4 + X_var5 + X_var6 + log(X_var7) + X_var8_ABC + X_var8_DEF + X_var8_MNO + X_var8_XYZ + X_var9, mu2 = Y_var2 ~ X_var23 + X_var4 + X_var5 + X_var6 + log(X_var7) + X_var8_ABC + X_var8_DEF + X_var8_MNO + X_var8_XYZ + X_var9)

  2. Fit the SUR Model and make summary:

    fitsur <- systemfit(formula = formula, data=sample_data2, method = "SUR")

    summary(fitsur)

Jovan
  • 763
  • 7
  • 26
  • Wow! Converting it to a numeric worked! :) I did not use one_hot(). Instead I just created a factor variable and converted it to numeric. So, now the class is numeric. However, for adding fixed effects, we need to convert it to factor(X_var8numeric) and factor(X_var9). But when I do this, I again get the same error. – Anisha Garg Jul 28 '21 at 09:40
  • The reason I create it using one hot is to represent the factor to numeric based on each of their unique classes.. if you convert it to numeric directly, then it will give numeric on sequences based on classes 1,2,3,4, and so on, which makes the other classes seems higher than other class (which is not make sense in this case) – Jovan Jul 29 '21 at 02:04
  • I know that you want to add fixed effects, but unfortunately factor variable are not supported for systemfit(), I tried already when I test your examples, so you will need to use another approach to model it – Jovan Jul 29 '21 at 02:09
  • You will not need to worry, there is nothing difference of factor and numeric variable in major when changed properly, only a different representation, since some model works better with numeric and it happens for systemfit() – Jovan Jul 29 '21 at 02:29
  • if my answers help you, appreciate it if you can tick the accepted answer :) – Jovan Jul 29 '21 at 03:03
  • Yes, I get your point and created dummy variables for all the values. However, I am still getting the same error. – Anisha Garg Jul 29 '21 at 06:35
  • what kind of error you get? do you want to do private discussion? – Jovan Jul 29 '21 at 07:42
  • Yes, that would be helpful. – Anisha Garg Jul 29 '21 at 10:24
  • eh sorry I guess theres no way to do PM, at least I dont know how, so here is my email: jovanshadowz@gmail.com – Jovan Jul 29 '21 at 12:05