0

I am trying to solve a large quadratic programming problem using IBM Cplex Python API. I am running out memory while constructing the qMat matrix which defines the quadratic part of the problem on a Linux machine with 120 GB RAM.

In the sample code below, qMat has 10 rows. In the actual problem I have about 100,000 rows. And each row has 10 - 1000 non-zero columns. The machine runs out of memory while constructing qMat which is a list of lists, each of which has two lists (one indicating columns, another indicating values). In other words, the machine runs out of memory even before it loads the problem into Cplex. It would help if you can suggest another way of developing qMat, and or loading such data into Cplex. I can of course write qMat line by line into a txt file but can such a file be loaded into Cplex?

The reason the code below is as follows is because qMatIndexes need to computed only once as net is fixed. But qMat itself needs to computed many times because it depends on mDict. I basically need to run Cplex optimization many times for many mDicts, with sparse Q matrix of 100,000 times 100,000 dimension. It is this sparse Q matrix which qMat represents.

from __future__ import division


import copy
import numpy as np
import collections
import cplex
from cplex.exceptions import CplexError
import sys
import random
import cjson
from pympler.asizeof import asizeof
from scipy.sparse import csr_matrix

net = collections.OrderedDict([(1, [1, 2, 4]), (2, [2, 3, 4]), (3, [3, 4]), (4, [1, 4])])
mDict = {1:10,2:20,3:30,4:40}

def write_qMat_indexes(net):
    i_indiciesRow = []
    j_indiciesColumn = []

    count = 0
    for ID in net:
        sup = list(set(net[ID]))
        for s in sup:
            i_indiciesRow.append(count + 1)
            j_indiciesColumn.append(s)
        count += 1

    with open('i_indiciesRow.txt', 'w') as file_i_indiciesRow:
        file_i_indiciesRow.write(cjson.encode(i_indiciesRow))

    with open('j_indiciesColumn.txt', 'w') as file_j_indiciesColumn:
        file_j_indiciesColumn.write(cjson.encode(j_indiciesColumn))

def QMatrixDisk():
    with open('i_indiciesRow.txt', 'r') as file:
        for line in file:
            iN_list = cjson.decode(line)

    with open('j_indiciesColumn.txt', 'r') as file:
        for line in file:
            jN_list = cjson.decode(line)

    jN_array = np.array(jN_list)
    jN_locations = {}
    for j in jN_list:
        locations = np.where(jN_array == j)[0]
        jN_locations[j] = list(locations)
    del jN_array
    print "done finding locations"


    with open('qMatIndexes.txt', 'w') as file:
        for r in xrange(len(jN_list)):
            jN = jN_list[r]
            columns = jN_locations[jN]
            firstID_list = []
            columnsList = []
            for c in columns:
                q = iN_list[c]
                if c >= r:
                    firstID_list.append(q)
                    columnsList.append(c)
            Data = (columnsList, firstID_list)
            file.write(cjson.encode(Data) + '\n')




def create_qMat_Disk(mDict):
    with open("i_indiciesRow.txt", 'r') as file:
        for line in file:
            iN_list = cjson.decode(line)

    with open("j_indiciesColumn.txt", 'r') as file:
        for line in file:
            jN_list = cjson.decode(line)


    num_lines = sum(1 for line in open('qMatIndexes.txt'))

    qMat = [[[], []] for _ in range(num_lines)]
    count = 0
    with open('qMatIndexes.txt', 'r') as nfile:
        for line in nfile:
            cq = cjson.decode(line)
            columns = cq[0]
            q_list = cq[1]
            baseAgent = jN_list[count]
            firstAgent = iN_list[count]
            baseM = mDict[baseAgent]
            firstM = mDict[firstAgent]
            multiple = 2.0 * firstM / baseM
            second_mList = [mDict[q] for q in q_list]
            vals = [s * multiple for s in second_mList]
            qMat[count][0] += columns
            qMat[count][1] += vals

            for k in xrange(len(columns)):
                col = columns[k]
                newInds = []
                newVals = []
                if col > count:
                    val = vals[k]
                    newInds.append(count)
                    newVals.append(val)
                qMat[col][0] += newInds
                qMat[col][1] += newVals

            count += 1

    return qMat



write_qMat_indexes(net)
QMatrixDisk()
create_qMat_Disk(mDict)


C = [-6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -4.0, -4.0, -4.0, -4.0]
my_ub = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
my_lb = [0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.05, 0.05, 0.05, 0.05]
my_rhs = [1, 1, 1, 1, 1.6]
my_sense = 'EEEEL'


W_rows = [0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4]
W_columns = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 3, 6, 9]
W_values = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

qMat = create_qMat_Disk(mDict)
print len(qMat), "len qMat"


my_prob = cplex.Cplex()
my_prob.parameters.qpmethod.set(4)
my_prob.parameters.lpmethod.set(4)
my_prob.objective.set_sense(my_prob.objective.sense.minimize)
my_prob.variables.add(obj=C, ub=my_ub, lb=my_lb)
my_prob.linear_constraints.add(rhs=my_rhs, senses=my_sense)
my_prob.linear_constraints.set_coefficients(zip(W_rows, W_columns, W_values))

my_prob.objective.set_quadratic(qMat)

my_prob.solve()

solutionValues = my_prob.solution.get_values()
user58925
  • 1,537
  • 5
  • 19
  • 28
  • I don't know the specifics of the problem you are trying to solve, but if you have a good understanding of your problem you can: (1) eliminate data which you know is unnecessary, thus alleviating the load on memory, or (2) find some tight-ish relaxation of your problem which uses a subset of the data and still provides a near-optimal solution. – Philippe Olivier Sep 13 '18 at 12:07
  • Thanks Philippe. Did as much (1) as I saw possible. Cannot do (2) because I am working with networks: interlinkages matter. – user58925 Sep 13 '18 at 12:10
  • how do you create the huge list ? You could try to preallocate `qmat = [list(range(1000)) for _ in range(100_000)]` and then fill in the needed values. Such a `qmat` allocates about 3 GB on my machine. So I think there is sth wrong when you create the entries. – rocksportrocker Sep 13 '18 at 13:46
  • Thanks, there is one more list in every row of qmat, so its more like n = 100000, m = 1000 qmat = [[list(range(m)),list(range(m))] for _ in range(n)] – user58925 Sep 13 '18 at 15:18
  • your method of creating a list of lists allows me to create a qmat with a memory of about 5.5GB. Let me try to implement this in my code and see if it works.... – user58925 Sep 13 '18 at 15:39
  • If the preallocation technique works for you, and the values returned by `range` are not actually meaningful (i.e., you just overwrite them with other values), then you might get much better performance doing something like `[[[None] * m, [None] * m]] * n`. See [this](https://stackoverflow.com/a/311783/1718477) stackoverflow answer. – rkersh Sep 13 '18 at 16:11
  • There is also [Cplex.objective.set_quadratic_coefficients](https://www.ibm.com/support/knowledgecenter/SSSA5P_12.8.0/ilog.odms.cplex.help/refpythoncplex/html/cplex._internal._subinterfaces.ObjectiveInterface-class.html#set_quadratic_coefficients), to set the coefficients one-by-one, but that would be extremely inefficient compared to using `Cplex.objective.set_quadratic`. – rkersh Sep 13 '18 at 16:37
  • I added more details of the problem I am working on, perhaps this will convey some of the details needed to develop a memory saving solution. I am not sure how to implement the preassigned qMat in this context. – user58925 Sep 13 '18 at 17:01
  • @rkersh I am using your solution. Is there a way to do it faster if I set all columns and associated values for a given row in one go, one row at a time. I mean instead of passing [(r, c1, v1), (r, c2, v2)], is there a way to way to pass r: (c1,v1), (c2,v2)... – user58925 Sep 14 '18 at 12:31

0 Answers0