33

I have a function that has a bunch of parameters. Rather than setting all of the parameters manually, I want to perform a grid search. I have a list of possible values for each parameter. For every possible combination of parameters, I want to run my function which reports the performance of my algorithm on those parameters. I want to store the results of this in a many-dimensional matrix, so that afterwords I can just find the index of the maximum performance, which would in turn give me the best parameters. Here is how the code is written now:

param1_list = [p11, p12, p13,...]
param2_list = [p21, p22, p23,...] # not necessarily the same number of values
...

results_size = (len(param1_list), len(param2_list),...)
results = np.zeros(results_size, dtype = np.float)

for param1_idx in range(len(param1_list)):
  for param2_idx in range(len(param2_list)):
    ...
    param1 = param1_list[param1_idx]
    param2 = param2_list[param2_idx]
    ...
    results[param1_idx, param2_idx, ...] = my_func(param1, param2, ...)

max_index = np.argmax(results) # indices of best parameters!

I want to keep the first part, where I define the lists as-is, since I want to easily be able to manipulate the values over which I search.

I also want to end up with the results matrix as is, since I will be visualizing how changing different parameters affects the performance of the algorithm.

The bit in the middle, though, is quite repetitive and bulky (especially because I have lots of parameters, and I might want to add or remove parameters), and I feel like there should be a more succinct/elegant way to initialize the results matrix, iterate over all of the indices, and set the appropriate parameters.

So, is there?

user2398029
  • 6,699
  • 8
  • 48
  • 80
dlants
  • 761
  • 1
  • 8
  • 14

4 Answers4

47

You can use the ParameterGrid from the sklearn module

http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.ParameterGrid.html

Example

from sklearn.grid_search import ParameterGrid
param_grid = {'param1': [value1, value2, value3], 'paramN' : [value1, value2, valueM]}

grid = ParameterGrid(param_grid)

for params in grid:
    your_function(params['param1'], params['param2'])
Sibelius Seraphini
  • 5,303
  • 9
  • 34
  • 55
  • 14
    New version of the module: `from sklearn.model_selection import ParameterGrid` – Abramodj Feb 14 '18 at 10:05
  • Also if you want to add randomization to some parameters of the grid you can use: `sklearn.model_selection.ParameterSampler` instead of `ParameterGrid`. – Mostafa Hadian May 12 '20 at 14:10
11

I think scipy.optimize.brute is what you're after.

>>> from scipy.optimize import brute
>>> a,f,g,j = brute(my_func,[param1_list,param2_list,...],full_output = True)

Note that if the full_output argument is True, the evaluation grid will be returned.

John Vinyard
  • 12,997
  • 3
  • 30
  • 43
  • I simplified the situation above a bit. I am actually reporting multiple values as output (several evaluation functions), so my results line is results[p1idx, p2idx, ... , :, :, :] = 3d matrix of output values. I don't think I can use the bruteforce search from scipy because of this. Your solution is strictly correct to the problem I posted above, but I am looking for ways to simplify that code to create the result matrix without resorting to an existing gridsearch function. – dlants Nov 13 '12 at 23:42
  • Is it the case that you're trying to find the best input parameters for one of many scalar outputs, e.g., "Give me the inputs that minimize result[i]", or do you have some way of evaluating the "goodness" of all the results at once, like the sum, or l1 or l2 norm? – John Vinyard Nov 13 '12 at 23:49
  • 1
    The values are accuracy, precision and recall for various objects. I'll be taking the strict min of the accuracy within each object category, and across objects, and combining the precision and recall measures in various ways. – dlants Nov 13 '12 at 23:59
  • @diants /John- I am solving a similar problem. Since you will be combining params in multiple ways, your myfunct will have the COMBINED optimization function right ? like a >=0.7 , and find min of b and max of c => min(a)+max(b)+a>=0.7 ==> {min(a)+min(-b)+(a-0.7)>0 } >0 a+(-b)+(-1)*(a-0.7)}<0 ? Say I need a Jaccard similarity score of at least 70%, and then optimize over the true positive & false negatives, and finally give me the Thresholds (which I get by querying the index). If you seen this elsewhere, please redirect – ekta May 30 '14 at 08:46
  • I posted the problem here, http://stackoverflow.com/questions/23993422/writing-the-objective-function-and-constraints-for-scipy-optimize-minimize-from?noredirect=1#comment36975036_23993422 Please consider answering . – ekta Jun 03 '14 at 03:52
11

The solutions from John Vinyard and Sibelius Seraphini are good built-in options, but but if you're looking for more flexibility, you could use broadcasting + vectorize. Use ix_ to produce a broadcastable set of parameters, and then pass those to a vectorized version of the function (but see caveat below):

a, b, c = range(3), range(3), range(3)
def my_func(x, y, z):
    return (x + y + z) / 3.0, x * y * z, max(x, y, z)

grids = numpy.vectorize(my_func)(*numpy.ix_(a, b, c))
mean_grid, product_grid, max_grid = grids

With the following results for mean_grid:

array([[[ 0.        ,  0.33333333,  0.66666667],
        [ 0.33333333,  0.66666667,  1.        ],
        [ 0.66666667,  1.        ,  1.33333333]],

       [[ 0.33333333,  0.66666667,  1.        ],
        [ 0.66666667,  1.        ,  1.33333333],
        [ 1.        ,  1.33333333,  1.66666667]],

       [[ 0.66666667,  1.        ,  1.33333333],
        [ 1.        ,  1.33333333,  1.66666667],
        [ 1.33333333,  1.66666667,  2.        ]]])

product grid:

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 1, 2],
        [0, 2, 4]],

       [[0, 0, 0],
        [0, 2, 4],
        [0, 4, 8]]])

and max grid:

array([[[0, 1, 2],
        [1, 1, 2],
        [2, 2, 2]],

       [[1, 1, 2],
        [1, 1, 2],
        [2, 2, 2]],

       [[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]]])

Note that this may not be the fastest approach. vectorize is handy, but it's limited by the speed of the function passed to it, and python functions are slow. If you could rewrite my_func to use numpy ufuncs, you could get your grids faster, if you cared to. Something like this:

>>> def mean(a, b, c):
...     return (a + b + c) / 3.0
... 
>>> mean(*numpy.ix_(a, b, c))
array([[[ 0.        ,  0.33333333,  0.66666667],
        [ 0.33333333,  0.66666667,  1.        ],
        [ 0.66666667,  1.        ,  1.33333333]],

       [[ 0.33333333,  0.66666667,  1.        ],
        [ 0.66666667,  1.        ,  1.33333333],
        [ 1.        ,  1.33333333,  1.66666667]],

       [[ 0.66666667,  1.        ,  1.33333333],
        [ 1.        ,  1.33333333,  1.66666667],
        [ 1.33333333,  1.66666667,  2.        ]]])
senderle
  • 145,869
  • 36
  • 209
  • 233
2

You may use numpy meshgrid for this:

import numpy as np

x = range(1, 5)
y = range(10)

xx, yy = np.meshgrid(x, y)
results = my_func(xx, yy)

note that your function must be able to work with numpy.arrays.

Nils Werner
  • 34,832
  • 7
  • 76
  • 98