3

I have a functioning pylearn2 neural network which loads data from a csv and predicts a continuous target variable. How can I change it to predict multiple distinct target variables?

I am using Kaggle's African soil dataset.

And have constructed this functioning mlp file:

!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.csv_dataset.CSVDataset {
    path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
    task: 'regression',
    start: 0,
    stop: 1024,
    expect_headers: True,
    num_outputs: 1
},
model: !obj:pylearn2.models.mlp.MLP {
    layers : [
        !obj:pylearn2.models.mlp.RectifiedLinear {
            layer_name: 'h0',
            dim: 200,
            irange: .05,
            max_col_norm: 2.
        },
        !obj:pylearn2.models.mlp.RectifiedLinear {
            layer_name: 'h1',
            dim: 200,
            irange: .05,
            max_col_norm: 2.
        },
        !obj:pylearn2.models.mlp.LinearGaussian {
            init_bias: !obj:pylearn2.models.mlp.mean_of_targets {
                dataset: *train },
            init_beta: !obj:pylearn2.models.mlp.beta_from_targets {
                dataset: *train },
            min_beta: 1.,
            max_beta: 100.,
            beta_lr_scale: 1.,
            dim: 1,
            layer_name: 'y',
            irange: .005
        }
    ],
    nvis: 3594,
},
algorithm: !obj:pylearn2.training_algorithms.bgd.BGD {
    line_search_mode: 'exhaustive',
    batch_size: 1024,
    conjugate: 1,
    reset_conjugate: 0,
    reset_alpha: 0,
    updates_per_batch: 10,
    monitoring_dataset:
        {
            'train' : *train,
            'valid' : !obj:pylearn2.datasets.csv_dataset.CSVDataset {
                path: 'C:\Users\POWELWE\Git\pylearn2\pylearn2\datasets\soil\training_CA.csv',
                task: 'regression',
                start: 1024,
                stop: 1156,
                expect_headers: True,
            }
        },
    termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
        channel_name: "valid_y_mse",
        prop_decrease: 0.,
        N: 100
    },
},
extensions: [
    !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
         channel_name: 'valid_y_mse',
         save_path: "${PYLEARN2_TRAIN_FILE_FULL_STEM}_best.pkl"
    },
],
save_path: "mlp.pkl",
save_freq: 1

}

For the purpose of predicting a single target variable, I removed all target variables from the dataset except Ca, and moved that to the first column. When I run the following command in the ipython console, it functions for that single variable:

%run 'C:\Users\POWELWE\Git\pylearn2\pylearn2\scripts\train.py' mlp.yaml

I would like to include the other 4 target variables (P, pH, SOC, Sand), but do not know how I can set my model to train on these additional targets. I assume I need to perform some manipulations of num_outputs, dim, or nvis, but haven't had any success in my attempts. This is a precursor project to one with many more target variables, so it is important that I train using a single network, rather than constructing a new network for each target variable.

powellwe
  • 31
  • 3

1 Answers1

0

To train a network which predicts values of several variables at the same time you just need to setup your network to have multiple output neurons and feed it with the training data just the same way you do know but with multiple target values at the same time. I haven't used pylearn ever - I prefer Caffe, nolearn(lasagne) or pybrain, each of these libraries are able to easily handle such cases.

Example of pybrain implementation (code was used in kaggle's BikeShare challenge):

http://pastebin.ru/tqpMTzIz

Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70
  • I understand logically what needs to happen. I'm just not sure how to implement it in pylearn2. I need to know how I can tell pylearn2 to train on multiple target variables (and produce results for both). Do you have an examples from one of these other libraries you could share? It might help me determine what I need to change in the pylearn2 implementation. – powellwe Aug 03 '15 at 16:29
  • @powellwe updated answer. This one I used on Kaggle bikeshare challenge – Maksim Khaitovich Aug 03 '15 at 16:46
  • @powellwe have couple of bugs in naming or pandas part but in general it is correct - at least in pybrain part – Maksim Khaitovich Aug 03 '15 at 16:47
  • It makes sense to me. I'm trying to see whether I can transfer any of the logic to pylearn2. The main reason I'd like to stick with pylean2 is that it uses GPU processing, which saves a significant amount of training time when I'm using a large dataset. Unfortunately, it's been very difficult for me to understand what all the code is doing because I primarily work with a .yaml file, rather than the python code. – powellwe Aug 03 '15 at 17:02
  • @powellwe well, if pylearn2 is a well-defined library it should have all the same functionality as any other library and should support multioutput regression too. Though I'd really suggest you to stick with caffe library - it is really cool and fast. – Maksim Khaitovich Aug 03 '15 at 17:17