1

Anyone know why I am getting this error? The attribute 'Kön' is actually called 'Kön' in the training file. And I don't have a column with the name training in the training file.

ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = 
TRUE)


mod_fit <- train(as.factor(training$Riskdrickare)~., data=training, method="glm",family="binomial", trControl = ctrl, tuneLength = 5)


Error: Unknown columns 'training', '<U+FEFF>Kön'
In addition: There were 11 warnings (use warnings() to see them)

I am suppose to run this code afterwards:

pred = predict(mod_fit, newdata=test)

confusionMatrix(data=pred, as.factor(test$Riskdrickare))

The warnings states:

Warning messages:
1: In train.default(x, y, weights = w, ...) :
  You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: glm.fit: fitted probabilities numerically 0 or 1 occurred
6: glm.fit: fitted probabilities numerically 0 or 1 occurred
7: glm.fit: fitted probabilities numerically 0 or 1 occurred
8: glm.fit: fitted probabilities numerically 0 or 1 occurred
9: glm.fit: fitted probabilities numerically 0 or 1 occurred
10: glm.fit: fitted probabilities numerically 0 or 1 occurred
11: glm.fit: fitted probabilities numerically 0 or 1 occurred
12: glm.fit: fitted probabilities numerically 0 or 1 occurred

The colum names are:

Kön Ålder Nationalitet Fakultet Riskdrickare Terminer Omtenta Sektionsaktiv studieTim Träning Frysmat Frukost Vegan Sömn Relation dator dAlc wAlc FamRel fHealth mhealth Smoke fSize fTog pStudEvent schemUnd drickStudenEven fritidVän FritidTid

They consist of both numerical and categorical values.

A reproducible example would be an equivalent dataset for this assignment.

If I apply as.factor(Riskdrickare) then the first error goes away but the rest remain.

Solved: By setting the encoding to Ansi and remove all the letters 'å'. 'ä' and 'ö' from the colummn names removed the error message.

sockevalley
  • 351
  • 2
  • 4
  • 17
  • caret I presume? If so you should add it as a tag, maybe along with glm. – Mike Wise May 07 '17 at 21:54
  • Can you write how it should look like? I got the library(caret) above in the script. – sockevalley May 07 '17 at 22:17
  • I did it for you then. – Mike Wise May 07 '17 at 22:19
  • I don't understand what you mean with add a tag and maybe along with glm. – sockevalley May 08 '17 at 10:24
  • I meant the tags (meta-data), that are light blue under the question. Makes it easier for interested people to find your question as they look for those tags. In this case it is probably an issue with the way the `caret` package handles the `glm` method (these have the tags `r-caret` and `glm`). – Mike Wise May 08 '17 at 10:26
  • Oh I see, cheers then! You don't have any ideas to why I am having this error? Really annoying error since no have had it before me it seems. Do you think there could be some problems with the names? 'Kön' – sockevalley May 08 '17 at 10:34
  • Not really, would have to look and debug it and I don't have the time at the moment. But I think it is a good question, someone (maybe Max himself?) will eventually have a look I imagine. – Mike Wise May 08 '17 at 10:35
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – David Heckmann May 10 '17 at 00:02

0 Answers0