0

I am trying to create logit model in R but I have some problems.

My data looks like that

head(duomframe)

 DNSB.Ražuva X1 X1.1 X0 X0.1 X40 X8013.54 X1.2 X0.2 X0.3 X0.4 X0.5
1 UAB Antakalnio būstas  1    1  0    0  51   511.55    0    1    0    0    0
2 UAB Antakalnio būstas  1    0  1    0  54   519.46    0    1    0    0    0
3 UAB Antakalnio būstas  1    0  1    0  42   492.70    0    1    0    0    0
4 UAB Antakalnio būstas  1    0  1    0  51   515.68    0    0    0    0    0
5 UAB Antakalnio būstas  1    0  1    0  49  2308.78    0    1    0    0    0
6 UAB Antakalnio būstas  1    0  1    0  63   381.75    0    1    0    0    0
  X0.6 X7197.16 X78.23 X4 X1.3 X0.7 X0.8 X7783.31 X2 X1.4 X42.22 X14 X33.33
1    0     0.00  86.80  2    4    0    0   173.36  1    1  58.31   5 189.79
2    0     0.00  53.67  1    2    0    3   204.85  0    1  66.29   2 140.00
3    0     0.00  52.13  2    3    0    0   160.73  0    2  93.69   2 119.03
4    1   415.68  45.19  3    1    2    0   641.54  0    1  53.56   6 102.11
5    0     0.00 103.44  3    1    3    0   113.08  0    2 122.45   5 527.61
6    0     0.00  49.75  4    3    0    0   384.62  0    3  75.09   3  69.46
  X0.9 X12
1    0   4
2    1   3
3    2   4
4    0   6
5    0   5
6    1   4

I change type of variable to numeric:

duomframe[,-1] <- lapply(duomframe[,-1], function(x) {as.numeric(gsub(",", "", x)

and these ones wich are binomial i change to factor as i found that i should do that

gyvenvietefac<-as.factor(gyvenviete)
vyrasfac<-as.factor(vyras)
moterisfac<-as.factor(moteris)
juridinisfac<-as.factor(juridinis)
dirbamafac<-as.factor(dirbama)
atsiskaitytafac<-as.factor(atsiskaityta)
nerateisiniofac<-as.factor(nerateisinio)
nesuejesterminasfac<-as.factor(nesuejesterminas)
notarasfac<-as.factor(notaras)
perrasytasfac<-as.factor(perrasytas)

And than I am trying to create model :

glm.out = glm((moketa_menesiu/laikotarpis) ~ gyvenvietefac + vyrasfac + moterisfac + 
               juridinisfac + amzius + dirbamafac + atsiskaitytafac + nerateisiniofac +
               nesuejesterminasfac + notarasfac + perrasytasfac + (neapmoketaslikutis/skola) + (neapmoketaslikutis/einamojiskola) + 
               moketa_kartu + plotas + gyvsk + kiekvekseliuturejo + kiekperrasytas + (mokejimuvidurkis/priskaitymuvidurkis), 
               gyvenvietefac="binomial", vyrasfac="binomial", moterisfac="binomial", juridinisfac="binomial", dirbamafac="binomial", atsiskaitytafac="binomial", nerateisiniofac="binomial", 
               nesuejesterminasfac="binomial", notarasfac="binomial", perrasytasfac="binomial", data=duomframe[,1])

But in this way R gives me back an error :

Error in glm.control(gyvenvietefac = "binomial", vyrasfac = "binomial",  : 
  unused arguments (gyvenvietefac = "binomial", vyrasfac = "binomial", moterisfac = "binomial", juridinisfac = "binomial", dirbamafac = "binomial", atsiskaitytafac = "binomial", nerateisiniofac = "binomial", nesuejesterminasfac = "binomial", notarasfac = "binomial", perrasytasfac = "binomial")

I found another way that i should use binomial("logit") instead of "binomial" but than it gives me an error :

Error in glm.control(gyvenvietefac = list(family = "binomial", link = "logit",  : 
  unused arguments (gyvenvietefac = list(family = "binomial", link = "logit", linkfun = function (mu) 
.Call(C_logit_link, mu), linkinv = function (eta) 
.Call(C_logit_linkinv, eta), variance = function (mu) 
mu * (1 - mu), dev.resids = function (y, mu, wt) 
.Call(C_binomial_dev_resids, y, mu, wt), aic = function (y, n, mu, wt, dev) 
{
    m <- if (any(n > 1)) n else wt
    -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m * y), round(m), mu, log = TRUE))
}, mu.eta = function (eta) 
.Call(C_logit_mu_eta, eta), initialize = expression({
    if (NCOL(y) == 1) {
        if (is.factor(y)) y <- y != levels(y)[1]
        n <- rep.int(1, nobs)
        y[weights == 0] <- 0
        if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1")
        mustart <- (weights * y + 0.5)/(weights + 1)
        m <- weights * y
        if (any(abs(m - round(m)) > 0.001)) warning("non-integer #successes in a binomial glm!")
    } else if (NCOL(y) == 2) {
        if (any(abs(y - round(y)) > 0.001)

Please advice how i should use this one. Thank you

Lauryna
  • 11
  • 5

2 Answers2

1

It is hard to replicate errors without a reproducible data.

Try this:

glm.out = glm(formula = (moketa_menesiu/laikotarpis) ~ gyvenvietefac + 
                vyrasfac + moterisfac + juridinisfac + amzius + dirbamafac + 
                atsiskaitytafac + nerateisiniofac + nesuejesterminasfac + 
                notarasfac + perrasytasfac + (neapmoketaslikutis/skola) + 
                (neapmoketaslikutis/einamojiskola) + moketa_kartu + plotas + 
                gyvsk + kiekvekseliuturejo + kiekperrasytas + 
                (mokejimuvidurkis/priskaitymuvidurkis),
              family=binomial(logit),
              data=duomframe[,1])
Community
  • 1
  • 1
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    Would `data=duomframe[,1]` not just pass one column? – user20650 Aug 19 '14 at 10:32
  • But than it gives me this error: Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 – Lauryna Aug 19 '14 at 10:34
  • 1
    What is the structure of both `moketa_menesiu` and `laikotarpis`. ie `str(moketa_menesiu)`. A logistic regression expects binomial data (0/1) as your dependent variable – user20650 Aug 19 '14 at 10:38
  • Also when you are converting to `factor` you are not reassigning the variable back to your data. You should use `duomframe$gyvenvietefac<-as.factor(duomframe$gyvenviete)`. Or are these not even in `duomframe` ? – user20650 Aug 19 '14 at 10:43
  • > str(moketa_menesiu) num [1:421] 5 2 2 6 5 3 11 10 10 10 ... > str(laikotarpis) num [1:421] 4 3 4 6 5 4 6 6 6 6 .... It's numeric. So these errors appeaer because i choose not binomial variables as dependent ones? – Lauryna Aug 19 '14 at 10:44
  • It is duomframe[,1] because of this duomframe[,-1] <- lapply(duomframe[,-1], function(x) {as.numeric(gsub(",", "", x))}) – Lauryna Aug 19 '14 at 10:52
  • Normally, you pass binomial data for logistic regression but you can also pass proportions; if you pass successes and failures in your formula as `glm(cbind(successes, failures) ~ `. I think you should have a look online as there are lots of worked examples on logistic regression [Here is one to get you started](http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html) – user20650 Aug 19 '14 at 12:00
0

The standard code for the logit model is for 3 variables:

logit.ll <- function(beta,y,x){
z <- beta[1] + x %*% beta[2:3]
p <- 1/(1+exp(-z))
prob <- ifelse (y==1,p,1-p)
-sum(log(prob))
}

Then for the table of results you can use the following:

covvar <- solve(m$hessian)
t.stat <- estimate/sd
p.value <- 2*pnorm(-abs(t.stat))