2

I have a data with 4 predictors and 3 responses. I am trying to use xgb in R to do the model. however, when I reach the definition of the DMatrixes, I got the below error:

Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : 
  The length of labels must equal to the number of rows in the input data

40 observations of my data are:

structure(list(X1 = c(9, 6, 22.5, 11.5, 11.5, 7.5, 8.5, 8, 3.5, 
1.5, 6, 4.5, 4, 5, 4.5, 2, 3.5, 4.5, 4, 6.5, 8, 4, 3.5, 6, 3.5, 
5, 6, 4, 4.5, 3.5, 3.5, 3, 4.5, 5, 3, 3.5, 6.5, 5, 3.5, 5), X2 = c(1.7, 
1, 2.2, 1.2, 1.2, 1.4, 1.3, 1.5, 0.7, 0.4, 0.9, 0.9, 0.7, 0.9, 
0.9, 0.2, 0.9, 0.8, 0.7, 1, 1.5, 0.8, 0.5, 1.3, 0.9, 1.2, 1.2, 
0.9, 0.8, 0.6, 0.6, 0.8, 0.7, 1.3, 0.5, 0.6, 1.2, 0.9, 0.7, 0.9
), X3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0), X4 = c(56L, 58L, 60L, 59L, 51L, 56L, 50L, 47L, 39L, 22L, 
50L, 44L, 37L, 42L, 31L, 26L, 40L, 41L, 40L, 44L, 58L, 40L, 39L, 
49L, 48L, 40L, 52L, 51L, 37L, 31L, 40L, 55L, 32L, 43L, 54L, 52L, 
49L, 37L, 38L, 29L), Y1 = c(48L, 59L, 59L, 58L, 56L, 53L, 42L, 
23L, 32L, 47L, 36L, 55L, 40L, 56L, 54L, 30L, 53L, 54L, 56L, 57L, 
58L, 58L, 54L, 54L, 44L, 55L, 62L, 29L, 59L, 62L, 49L, 59L, 65L, 
65L, 63L, 62L, 51L, 37L, 45L, 60L), Y2 = c(54L, 68L, 71L, 62L, 
60L, 60L, 62L, 44L, 35L, 43L, 35L, 63L, 48L, 54L, 43L, 28L, 35L, 
53L, 47L, 64L, 57L, 55L, 63L, 51L, 56L, 60L, 58L, 33L, 61L, 60L, 
58L, 55L, 65L, 64L, 54L, 52L, 56L, 54L, 39L, 62L), Y3 = c(58L, 
60L, 59L, 51L, 56L, 50L, 47L, 39L, 22L, 50L, 44L, 37L, 42L, 31L, 
26L, 40L, 41L, 40L, 44L, 58L, 40L, 39L, 49L, 48L, 40L, 52L, 51L, 
37L, 31L, 40L, 55L, 32L, 43L, 54L, 52L, 49L, 37L, 38L, 29L, 32L
)), row.names = c(NA, 40L), class = "data.frame")

I am trying the following code:

#data Partitioning 
data <- read.csv("tests.csv", header = T, sep = ",")
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
#One Hot encoding for training  and testing sets:
trainm<- sparse.model.matrix(Y1+Y2+Y3~X1+ X2+ X3+ X4, data = train) 
train_label <- train[, cbind("Y1", "Y2", "Y3")]
testm<- sparse.model.matrix(Y1+Y2+Y3~X1+ X2+ X3+ X4, data = test) 
test_label <- test[, cbind("Y1", "Y2", "Y3")]
# Matrix 
dtrain <- xgb.DMatrix(trainm, label = train_label)
dtest <- xgb.DMatrix(testm, label = test_label)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
mustafa
  • 203
  • 1
  • 8
  • 2
    Seems it is not currently possible, at least directly; there is a [Github request](https://github.com/dmlc/xgboost/issues/2087) for this, which is [still open](https://github.com/dmlc/xgboost/issues/3439). A [workaround](https://stackoverflow.com/questions/39540123/muti-output-regression-in-xgboost) has been suggested, but it will work only for the Python API. – desertnaut Oct 30 '19 at 15:48
  • @desertnaut thank you so much! – mustafa Oct 31 '19 at 16:12

0 Answers0