4

I have an error

library(mlr)
library(dplyr)
tree <- read.csv("file.csv",  header = T,  na.strings=c("", "NA")) 

tree$hipo = as.factor(tree$группы==1) # this variable has 3 levels i want to get 2
df = select(tree, - группы)
trainTask <- makeClassifTask( data = df,target = "hipo")

and i get

Error in makeClassifTask(data = df, target = "hipo") : 
  Assertion on 'data' failed: Columns must be named according to R's variable naming rules.

Then i do as below

tree <- read.csv("file.csv",  header = T,  na.strings=c("", "NA")) 
tree$группы = as.factor(tree$группы==1)
trainTask <- makeClassifTask( data = tree,target = "группы")

That's correct! So the problem is in select? I try to reproduce this with a toy example

df = data.frame('пер'= c(1,0,2,0,1,2), 'b' = c(1,1,0,0,1,0), 'c' = c(1,1,0,0,1,0))
str(df)
df$d = as.factor(df$пер==1)
df1 = select(df, - пер)
trainTask <- makeClassifTask( data = df1,target = "d")

That's correct! what could be the problem? cyrillic names? And i checked names with make.names

Edward
  • 4,443
  • 16
  • 46
  • 81
  • 3
    What's the output of `names(df)` and `make.names(names(df))`? are they identical? – Brian Aug 19 '17 at 20:26
  • 1
    yes `names(df) [1] "пол" "Do" "Sy" "Sp" "Lp" "Ie" "Fx" "hipo" > make.names(names(df)) [1] "пол" "Do" "Sy" "Sp" "Lp" "Ie" "Fx" "hipo"` The problem is lost when I change cyrillic to latin... – Edward Aug 19 '17 at 20:57
  • Can you post a complete example that allows to reproduce the problem please? – Lars Kotthoff Aug 19 '17 at 22:29
  • 1
    The cyrillic names are the problem in the task generation as you can easily see when you just try to run `makeClassifTask(data = df, target = "c")` on your last small example. We could allow this internally but I am always afraid that allowing non latin characters will break things in other places. – jakob-r Sep 04 '17 at 08:51
  • 2
    You can make syntactically valid names using the following approach: `colnames(df) <- make.names(colnames(df),unique = T)` – timothyjgraham Oct 16 '18 at 04:54

0 Answers0