Round Float Vector to Integer Vector

Question

As part of a larger simulation program (the context of which I'm happy to share, but which is not relevant to the question), I have run across the following problem and am looking for a good algorithm.

The problem: You are given a floating-point array f of length n (with elements f_1, ..., f_n) specifying a point in n-dimensional space. It can fairly be assumed that the sum of f is 0.0 (subject to floating point precision).

The sought: Find the integer array i of length n (with elements i_1, ..., i_n), specifying a grid point in n-dimensional space, such that the sum of i is exactly 0 and d(f,i), a suitable measure of distance between f and i, is minimized.

As for the suitable measure, I think the best one would be the one that minimizes the relative error (i.e., sum over j of (i_j/f_j-1)^2), but minimizing ordinary euclidean distance (i.e., sum over j of (i_j-f_j)^2) might work too.

The best algorithm which readily occurs to me is to guess a suitable point on the i grid (with sum 0) and then repeatedly switching to the neighboring grid points (with sum 0) with the minimal distance until all neighbors have a larger distance. Given the concavity of the distance function, this should converge on the solution.

But that algorithm seems clunky. Can anyone do better?

There is a related discussion at How to round floats to integers while preserving their sum? but it does not reach the optimality point I'm looking for.

ADDENDUM ON CONTEXT: In case it is of interest (and also because I thought it was kind of cool), let me specify the context in which the problem arose.

The problem occurs as part of a trading simulation (which is part of a yet larger simulation). At each location, agents offer to trade a number of commodities. As each location and commodity is handled separately, we can focus on one location and commodity and deal with them seriatim.

Each agent j has an amount of currency c_j and a quantity of the commodity q_j, which are and have to remain integral. Each agent also specifies a real-valued, continuous, non-negative, non-monotonically decreasing demand function d_j(p) which essentially represents how many units of the commodity the agent would like to own at any given price.

Trades are executed in the following steps.

For each agent j calculate a budget-constrained demand function b_j(p) = min (d_j(p), q_j + c_j/p)
Define total quantity q_tot (sum over j of q_j) and total budget-constrained demand function b_tot(p) (sum over j of b_j(p).
Find the equilibrium price p_eq for which b_tot(p_eq) = q_tot, using Newton's method. If there is no equilibrium price, return.
Define the traded quantity for each agent f_j as b_j(p_eq)-q_j (which is positive for net purchases and negative for net sales).
Agents for which f_j is very small (e.g., less than 1/10) are removed from the calculation
Each q_j should be adjusted to q_j+f_j and each c_j to c_j-p_eq*f_j
It as this point that the problem arises. f_j and p_eq are floating point, but the q_j and c_j need to remain integral. To avoid creating or destroying currency or commodities, the arrays f_j and (-p_eq*f_j) need to be consistently rounded to integrals

You say an "n-dimensional floating-point array", but do you mean an array of size n? — Vaughn Cato, Feb 28 '15 at 13:56
Yes. I was thinking of the problem geometrically with the input array specifying a point in n-dimensional space. — CarlEdman, Feb 28 '15 at 13:58
Seems like a branch-and-bound algorithm would be your best bet. — Vaughn Cato, Feb 28 '15 at 14:01
@VaughnCato True. I think that would be rather similar (given the properties of the distance function) to what I described as a possible solution above. My reason for asking was the hope that somebody had something faster and/or more elegant. — CarlEdman, Feb 28 '15 at 14:05
Since this is an integer programming problem, I doubt that there is a polynomial-time algorithm for it. I think branch-and-bound would be quite good with some tuning. I'll write something up. — Vaughn Cato, Feb 28 '15 at 14:08
@VaughnCato Thanks, that would be helpful. While elegance and speed are always good, I expect n to be typically 2 or 3 and almost always in the single digits, and there is no real-time constraint, so speed is not a strong requirement. — CarlEdman, Feb 28 '15 at 14:12
I update my answer and provide a general algorithm that works with any dimension. It took me some time to find it and I updated my answer in this process. The algorithm computes the solution directly. I also provide the python code to compute it. You may want to compare it's result with the one of @Vaughn Cato and its performance. I hope you will reconsider the valid answer selection. — chmike, Mar 01 '15 at 11:39

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

When n = 2

This is the easy one. Define i_k to be computed with the algorithm below. The sum of i_k will always be 0 and the euclidean distance will always be minimal.

if (f[k] < 0)
   i[k] = int(f[k]-0.5);
else
   i[k] = int(f[k]+0.5);

int() is a function that returns the integer part of the floating point number. It truncates the floating point.

Explanation

Let us define e_k such that f_k = i_k + e_k.

For n = 2, f₀ = -f₁. The two f_k have same magnitude but are of opposite sign. By rounding toward 0, the two errors have also same magnitude but are of opposite sign. Because sum(f) = sum(i) + sum(e) and sum(e) and sum(f) equal 0, sum(i) = 0.

In addition to balancing the magnitude of the two error, by rounding toward the nearest integer, we minimize the error. The sum(e²) will be minimal.

When n = 3

We compute i_k as above. The sum(i) may then take the value -1, 0 or 1.

When sum(i) = -1, we have to increment one of the i_k. Pick the i_k with the biggest e_k (all e_k are positive).

When sum(i) = 1, we have to decrement one of the i_k. Pick the i_k with the smallest e_k (all e_k are negative).

Explanation

With n=3, we have 3 error values with |e_k| < 0.5. As a consequence |sum(e)| can never be 2 or more. Since sum(i) can only take an integer value, sum(e) can only take value -1, 0 or 1, and sum(i) as well.

|sum(e)| = 1 when all e_k have same sign. This is because |e_k| < 0.5. You always need three errors of same sign to reach 1 or -1. Note that this is statistically less frequent than the case where they have different signs and sum(i) = 0.

How do we decide which i_k to pick ?

When sum(i) = 1, then sum(e) = -1 and all e_k are thus negative. We have to decrement one i_k. Decrementing one i_k will be balanced by incrementing its e_k because sum(i) + sum(e) = 0. We should thus pick the e_k so that incrementing it, yields an error with smallest magnitude. This is the e_k closest to -0.5 and thus the smallest e_k. This ensures that sum(e²) is minimal.

The same logic applies when sum(i) = -1. In this case sum(e) = 1 and all e_k are positive. Incrementing one i_k is balanced by decrementing its e_k. By picking the e_k closest to 0.5, we get the smallest sum(e²).

When n = 4

In this case sum(i) is still limited to the values -1, 0 and 1.

When sum(i) = -1, pick the biggest e_k.

When sum(i) = 1, pick the smallest e_k.

Explanation

The |sum(e)| can't reach 2. This is why sum(i) is limited to the values -1, 0 and 1.

The difference with n = 3 is that now we may have 3 or 4 e_k with same signs. But by applying the above rule the sum(e²) remains minimal.

Generalization

When n > 4 |sum(e)| can be bigger than 1. In this case we have to modify more than one i_k.

The general algorithm is then as follow

sum(i) -> m
when m = 0, 
    we are done.
when m < 0, 
    increment the m i<sub>k</sub> with biggest e<sub>k</sub>. 
when m > 0, 
    decrement the m i<sub>k</sub> with smallest e<sub>k</sub>.

Here is the python 2.7 code

def solve(pf):
    n = len(pf)
    
    # construct array pi from pf
    pi = [round(val) for val in pf]
    print "pi~:", pi
    
    # compute the sum of the integers
    m = sum(val for val in pi)
    print "m :", m
    
    # if the sum is zero, we are done
    if m == 0:
        return pi
        
    # compute the errors
    pe = [pf[k]-pi[k] for k in xrange(n)]
    print "pe :", pe

    # correct pi when m is negative    
    while m < 0:
        # locate the pi with biggest error
        biggest = 0
        for k in xrange(1,n):
            if pe[k] > pe[biggest]:
                biggest = k
                
        # adjust this integer i
        pi[biggest] += 1
        pe[biggest] -= 1
        m += 1
    
    # correct pi when m is positive    
    while m > 0:
        # locate the pi with smallest error
        smallest = 0
        for k in xrange(1,n):
            if pe[k] < pe[smallest]:
                smallest = k
        
        # adjust this integer i
        pi[smallest] -= 1
        pe[smallest] += 1
        m -= 1
   
    return pi
    
    
if __name__ == '__main__': 
    print "Example case when m is 0"    
    pf = [1.1, 2.2, -3.3]
    print "pf :", pf
    pi = solve( pf )
    print "pi :", pi
    
    print "Example case when m is 1"    
    pf = [0.6, 0.7, -1.3]
    print "pf :", pf
    pi = solve( pf )
    print "pi :", pi
    
    print "Example case when m is -1"    
    pf = [0.4, 1.4, -1.8]
    print "pf :", pf
    pi = solve( pf )
    print "pi :", pi

Thank you very much. That is a good a straightforward solution for the Euclidean metric case. — CarlEdman, Mar 01 '15 at 14:52
Experimentally, this seems to always give the same answer as my branch-and-bound approach. The logic also makes sense to me. Do you know the subclass of integer programming problems that this solves, such that this greedy approach is known to be optimal? (just curious) — Vaughn Cato, Mar 01 '15 at 14:54
I just tried to find a direct solution to the problem. I don't know what subclass of integer programming problems it solves. — chmike, Mar 01 '15 at 19:23

score 2 · Accepted Answer · answered Feb 28 '15 at 16:17

This is an integer programming problem. Branch and Bound is a simple approach that can be quite efficient in practice with good bounding conditions.

I've implemented a simple Branch and Bound algorithm. The main idea is to try the next higher and lower integer for each member of the array. At each step, we try whichever one produces less loss first. Once we've found a potential solution, we keep it if the solution is better than the best we've found so far, then we backtrack. If at any point we find that the loss of our partial solution is worse than the best total loss we've found, then we can prune that branch.

Here is a Python implementation of a basic brand-and-bound solution. There are many ways it could be optimized further, but this shows the basic idea:

from sys import stderr, stdout
from math import floor, ceil
import random

def debug(msg):
  #stderr.write(msg)
  pass

def loss(pf,pi):
  return sum((pf[i]-pi[i])**2 for i in range(0,len(pf)))

class Solver:
  def __init__(self,pf):
    n = len(pf)
    self.pi = [0]*n
    self.pf = pf
    self.min_loss = n
    self.min_pi = None
    self.attempt_count = 0

  def test(self):
    """Test a full solution"""
    pi = self.pi
    pf = self.pf
    assert sum(pi)==0
    l = loss(pf,pi)
    debug('%s: %s\n'%(l,pi))
    if l<self.min_loss:
      self.min_loss = l
      self.min_pi = pi[:]

  def attempt(self,i,value):
    """Try adding value to the partial solution"""
    self.pi[i] = int(value)
    self.extend(i+1)
    self.attempt_count += 1

  def extend(self,i):
    """Extend the partial solution"""
    partial = self.pi[:i]
    loss_so_far = loss(self.pf[:i],partial)
    debug('%s: pi=%s\n'%(loss_so_far,partial))
    if loss_so_far>=self.min_loss:
      return
    if i==len(self.pf)-1:
      self.pi[i] = -sum(partial)
      self.test()
      return
    value = self.pf[i]
    d = value-floor(value)
    if d<0.5:
      # The the next lower integer first, since that causes less loss
      self.attempt(i,floor(value))
      self.attempt(i,ceil(value))
    else:
      # Try the next higher integer first
      self.attempt(i,ceil(value))
      self.attempt(i,floor(value))

def exampleInput(seed):
  random.seed(seed)
  n = 10
  p = [random.uniform(-100,100) for i in range(0,n)]
  average = sum(p)/n
  pf = [x-average for x in p]
  return pf

input = exampleInput(42)
stdout.write('input=%s\n'%input)
stdout.write('sum(input)=%s\n'%sum(input))

solver=Solver(input)
solver.extend(0)

stdout.write('best solution: %s\n'%solver.min_pi)
stdout.write('sum(best): %s\n'%sum(solver.min_pi))
stdout.write('attempts: %s\n'%solver.attempt_count)
stdout.write('loss: %s\n'%loss(input,solver.min_pi))
assert sum(solver.min_pi)==0

Thank you! That is excellent. As it so happens, I'm writing my project in Python too, so adapting your solution will involve a minimum of recoding. — CarlEdman, Feb 28 '15 at 17:28

Round Float Vector to Integer Vector

2 Answers2

When n = 2

Explanation

When n = 3

Explanation

When n = 4

Explanation

Generalization