How to scale stacked model approach for each country in data set?

Question

fitControl <- trainControl(
  method = "cv",
  number = 5,
  savePredictions = 'final',
  classProbs = F)

predictors<-c("Age", "Quantile","label1","label2")
outcomeName<-'Life_expt'

model_rf<-train(Life_expt ~ Age+Quantile+label1+label2,Train2[country==.BY],method='rf',trControl=fitControl,tuneLength=3)

Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), : RHS of == is length 0 which is not 1 or nrow (559). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.

I am trying to scale it across each country. I would like to use stacking approach and models (rf,svmRadial,glm). How can I do it for each country without the error?

Thanks

Could you please give more details (i.e. what do you mean by ensembling and scaling?) and make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by sharing some data/code? — OzanStats, Jul 23 '18 at 03:45
Your example is not reproducible because we're still missing sample data. That aside the part `Train2[country==.BY]` looks odd; what are `.BY` and `country` , and how are they defined? You should inspect `Train2[country==.BY]`. I imagine you want to subset `Train2`. Try `Train2[Train2$country == some_country_to_match, ]`. — Maurits Evers, Jul 23 '18 at 04:03
Could you add the ouput of `dput(head(Train2, 20))` for the first 20 rows of your data into your question? — Artem, Aug 13 '18 at 11:03

score 0 · Answer 1 · answered Aug 13 '18 at 19:45

You can scale by factor, in your case country, using by function:

# Data frame simulation
set.seed(123)
source <- data.frame(Country = factor(rep(c("Morocco", "Egypt", "Somali"), each = 4)),
                 Age = sample(18:100, 12, replace = TRUE),
                 Quantile = sample(1:100 / 100, 12, replace = TRUE))

# Apply scale for each group of rows with the same value of factor (country)
df_lists <- by(data = source, INDICES = source$Country, FUN = function(x){
  x$Scaled <- scale(x[, c(2, 3)])
  x
  }
)

# Combine data frame from the list
Train2 <- do.call(rbind, df_lists)
Train2

Output:

          Country Age Quantile  Scaled.Age Scaled.Quantile
Egypt.5     Egypt  42     0.15 -0.68003487     -1.47468600
Egypt.6     Egypt  30     0.42 -1.03102061      0.62092042
Egypt.7     Egypt  97     0.42  0.92864977      0.62092042
Egypt.8     Egypt  92     0.37  0.78240571      0.23284516
Morocco.1 Morocco  72     0.76  0.43965779      1.47808959
Morocco.2 Morocco  76     0.22  1.14311024     -0.65035942
Morocco.3 Morocco  63     0.32 -1.14311024     -0.25620220
Morocco.4 Morocco  67     0.24 -0.43965779     -0.57152797
Somali.9   Somali  75     0.16  0.56497964     -0.61136845
Somali.10  Somali  84     0.14  0.88278069     -0.74355623
Somali.11  Somali  20     0.24 -1.37713788     -0.08261736
Somali.12  Somali  57     0.47 -0.07062246      1.43754204

How to scale stacked model approach for each country in data set?

1 Answers1