I have a functioning pylearn2
neural network which loads data from a csv
and predicts a continuous target variable. How can I change it to predict multiple distinct target variables?
I am using Kaggle's African soil dataset.
And have constructed this functioning mlp file:
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.csv_dataset.CSVDataset {
path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
task: 'regression',
start: 0,
stop: 1024,
expect_headers: True,
num_outputs: 1
},
model: !obj:pylearn2.models.mlp.MLP {
layers : [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h0',
dim: 200,
irange: .05,
max_col_norm: 2.
},
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h1',
dim: 200,
irange: .05,
max_col_norm: 2.
},
!obj:pylearn2.models.mlp.LinearGaussian {
init_bias: !obj:pylearn2.models.mlp.mean_of_targets {
dataset: *train },
init_beta: !obj:pylearn2.models.mlp.beta_from_targets {
dataset: *train },
min_beta: 1.,
max_beta: 100.,
beta_lr_scale: 1.,
dim: 1,
layer_name: 'y',
irange: .005
}
],
nvis: 3594,
},
algorithm: !obj:pylearn2.training_algorithms.bgd.BGD {
line_search_mode: 'exhaustive',
batch_size: 1024,
conjugate: 1,
reset_conjugate: 0,
reset_alpha: 0,
updates_per_batch: 10,
monitoring_dataset:
{
'train' : *train,
'valid' : !obj:pylearn2.datasets.csv_dataset.CSVDataset {
path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
task: 'regression',
start: 1024,
stop: 1156,
expect_headers: True,
}
},
termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
channel_name: "valid_y_mse",
prop_decrease: 0.,
N: 100
},
},
extensions: [
!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
channel_name: 'valid_y_mse',
save_path: "${PYLEARN2_TRAIN_FILE_FULL_STEM}_best.pkl"
},
],
save_path: "mlp.pkl",
save_freq: 1
}
For the purpose of predicting a single target variable, I removed all target variables from the dataset except Ca
, and moved that to the first column. When I run the following command in the ipython
console, it functions for that single variable:
%run 'C:\Users\POWELWE\Git\pylearn2\pylearn2\scripts\train.py' mlp.yaml
I would like to include the other 4 target variables (P
, pH
, SOC
, Sand
), but do not know how I can set my model to train on these additional targets. I assume I need to perform some manipulations of num_outputs
, dim
, or nvis
, but haven't had any success in my attempts. This is a precursor project to one with many more target variables, so it is important that I train using a single network, rather than constructing a new network for each target variable.