How to efficiently find the ideal column count for strings of a certain width?

Question

I have n strings of different length s₁, s₂, …, s_n that I want to display on a terminal in c columns. The terminal has a width of m characters. Each column i has a certain width w_i which is equal to the width of the longest entry in that column. Between each pair of columns there is a certain amount of space s. The total width of all columns including the space between cannot be larger than the width of the terminal (w₁ + w₂ + … + w_c + (c - 1) · s ≤ m). Each column shall contain ⌈n / c⌉ strings, except when n is not evenly dividable by c, in which case the last few columns shall be shorter by one entry or only the last column shall either be shorter depending on whether the strings are arranged across or down.

Is there an efficient (e.g. O(n·w) where w = max(w₁, w₂, …, w_n)) algorithm to figure out the maximal amount of columns that I can fit into c columns, if...

the strings are arranged across

string1 string2  string3 string4
string5 string6  string7 string8
string9 string10

the strings are arranged down

string1 string4 string7 string10
string2 string5 string8
string3 string6 string9

?

Later findings

I found out that s doesn't matter. Each instance of the problem where s > 0 can be translated into an instance where s = 0 by expanding each string by s characters and also expanding the width of the terminal by s characters to compensate for the extra s characters at the end of the screen.

Will the list of strings just rap? a, b, c, d, e, f is either (a, b), (c, d), (e, f) or (a, b, c) (d, e, f)? or can they be jumbled for optimization? — wckd, Jun 25 '14 at 16:29
don't you mean `(n-1) * s`, since the spaces are between the strings? — גלעד ברקן, Jun 25 '14 at 16:30
@wckd The order of the strings cannot be changed. They have to be placed in their original order either across or down. — fuz, Jun 25 '14 at 16:48
I'm confused about `c`. Are there two `c`'s - one that is the number of columns with width `W_c`, which we can change; and another `c` that is the total number of columns (that is, characters) per line on the terminal screen? — גלעד ברקן, Jun 25 '14 at 19:13
Ignore the first c. c is the number of characters that can be displayed in one line. I'm going to fix the question later. — fuz, Jun 25 '14 at 21:13
I don't think you can. Seems like your filling out a 2D table with auto-sized column widths and spacing with a fixed amount of characters that can be displayed per line, and you want to maximize the number of columns. The key is that you need to limit the size of each row to fit in the terminal. This means you really only need to store a single row (of ~'N/c` strings) for each column number, of which there are a total `O(N^2)` spots to be filled. — Nuclearman, Nov 23 '15 at 05:31
That being said, you may find benefit to using two approaches both of which make use of modulus . 1) Adding the largest sized remaining string to each row (use mod to find which column it should go in). Once a row gets close to being filled out: 2) Fill out a row's column separately , though you can do using modulus. | Essentially, this is a slight optimization to wckd's answer. | The apparent bottleneck is the same as a number of problems where the best algorithms are all `O(N^2)`, so thinking that is what it is. — Nuclearman, Nov 23 '15 at 05:47
Is the data set small enough to precalculate a histogram, showing the frequency of a set of various size strings? For example, how many strings are less than 5 chars long, how many are between 5 and 10, etc. Let's put it this way: what's the size and range of the string set, the range for `s` and upper bound of `m`? — גלעד ברקן, Nov 28 '15 at 15:51
@גלעדברקן Assume arbitrary large *n*, *w* < 1000, and *m* < 10000. — fuz, Nov 28 '15 at 15:55

wckd · Answer 1 · 2014-06-25T17:13:59.300

4

Unfortunately I think the fastest algorithm you can have is O(n^2). This is because you can determine if a configuration is possible for c columns in a single pass of the list but you can't know by how much to change c so basically you'll just have to try a different value for it. At most your algorithm will do this n times.

This is pseudo-code for how I would do it

for c = n.size, c > 0, c --
  remainingDist = m - c*s
  for i = 1, i <= c, i ++ 
    columnWidth = 0
    for j = i, j <= n.size; j += c
      columnWidth = max(n[j], columnWidth)
    end
    remainingDist -= columnWidth
  end
  if remainingDist >= 0
    success
    columns = c
    break
  end
end

you could jump to midway through the loop by first computing an average size of the items and figure an "ideal" number of columns from that.

edited Jun 25 '14 at 17:13

answered Jun 25 '14 at 16:49

wckd

410
2
9

1

The second and third for loops are really just a decomposition of a single walk through of the list by themselves they are O(n) – wckd Jun 25 '14 at 16:50
Computing the average size wouldn't help very much as corner cases (such as large columns that can either be part of one or of another column can still appear). – fuz Jun 25 '14 at 18:00
The truthfulness of your comment depends entirely on the data that you have. However it'll likely be very helpful because in how I wrote the pseudo code you'll start by seeing if everything can fit on one line. If you have a lot of data then you will waste a lot of loops seeing if n-1, n-2, n-3, ... fits on one line. The most ideal case is where the c*s + average_length * c = m because then there is no wasted space. Therefore starting at that would be very useful. – wckd Jun 25 '14 at 18:47
I have no influence on the shape of the data. Assuming that the data set is "nice" is not a good assumption. – fuz Jun 26 '14 at 08:15
@FUZxxl Thank you very much for awarding me with a bounty, although I do not feel my answer deserves it. wckd seems to provide a solid pseudocode to arrive at an answer. Since you have bounds for `m` and `w`, those can provide you with an `O(1)`, immediate initial upper bound for `c`. I thought my idea could offer a slight optimization, but might only offer significant reduction if the frequency of longer strings is significantly higher. – גלעד ברקן Nov 29 '15 at 20:18
@גלעדברקן I don't award a bounty to wckd because he just posted the immediately obvious algorithm (pick an upper bound and then decrease it until you can't). I had that algorithm before I asked this question and somehow expected that people would not post it for it is obvious and slow. – fuz Nov 29 '15 at 20:21

גלעד ברקן · Answer 2 · 2015-11-29T15:43:34.300

3

I'm not sure exactly how to formulate this in code but with a histogram of the strings sorted by size, we may be able to set a theoretical upper bound on the number of columns, to be further refined by an exact method like wckd's. Since column size is dictated by its longest element, and we are obliged to divide columns as evenly as possible, as long as the number of the largest strings so far is a small enough portion of the total, we can continue to split the columns with shorter strings. For example,

size frequency
10   5
 6   10
 3   11

m = 30, s = 1

start: 30 - (10 + 1) = 19
implies 13 + 13:

10 (x5)  6 (x2) 
 6 (x8)  3 (x11)

but the portion of 6 and 3 is large enough to split the second column:

19 - (6 + 1) = 12
implies 9 + 9 + 8:

10 (x5) 6 (x6) 3 (x8)
6  (x4) 3 (x3)

but splitting again will still fit:
12 - (6 + 1) = 5
implies 7 + 7 + 6 + 6:

10 (x5) 6 (x7) 6 (x1) 3 (x6)
6  (x2)        3 (x5)

We end up with at most 4 columns theoretically (clearly, sorting the strings is not allowed in the actual result) which may be reduced by wckd's method.

Depending on the data, I wonder if such an optimization could sometimes be useful. Constructing the histogram ought to take O(n + k * log k) time and O(k) space, where k is the number of string sizes (which you have already limited with w < 1000, m < 10000). And the operation I am suggesting is actually independent of n, it depends only on m, s and the distribution of k; and since k is sorted, we only need one pass splitting/calculating the columns.

edited Nov 29 '15 at 15:43

answered Nov 29 '15 at 05:49

גלעד ברקן

23,602
3
25
61

I'm not sure what you try to achieve with this: It's not allowed to reorder the cells. – fuz Nov 29 '15 at 09:08
@FUZxxl Thank you for your comment. As I explained in the answer, I am trying to set a "theoretical upper bound on the number of columns, to be further refined by an exact method like wckd's;" and, "clearly, sorting the strings is not allowed in the actual result;" rather, the histogram sorted by size is only a potentially helpful tool to lower the starting point from `c = n.size`. In my example, the starting point for the `O(n^2)` loop is reduced from 26 to 4. – גלעד ברקן Nov 29 '15 at 10:43
Can you prove that you can set an asymptotically better upper bound with this approach? – fuz Nov 29 '15 at 10:54
@FUZxxl proving whether this approach can set an asymptotically better upper bound might be too advanced for me mathematically but will be interesting to think about. I'll give that some thought. – גלעד ברקן Nov 29 '15 at 10:59
@FUZxxl Yes, I can prove that this approach can set an asymptotically better upper bound. The proof is very simple: (1) the histogram takes `O(n + k * log k)` time and `O(k)` space, where `k` is the number of string sizes (which you have already limited with `w < 1000, m < 10000`); and (2) the operation I am suggesting is actually independent of `n`, it depends only on `m`, `s` and the distribution of `k`. And since `k` is sorted, we only need one pass splitting/calculating the columns. – גלעד ברקן Nov 29 '15 at 12:04
your proof does say anything avout the upper bound. Is the upper bound you compute better than the naive O(n) resp O(n/w)? – fuz Nov 29 '15 at 15:25
@FUZxxl I don't understand your comment - what do you mean by "the naive O(n) resp O(n/w)" ? – גלעד ברקן Nov 29 '15 at 15:29
I'm talking about the estimate your method generates, not the time it takes to generate the estimate. The naive estimate is n = O(n) with an obvious improvement to O(n/w). For your method to have an asymptotically better runtime, the estimate must be sublinear. Is it? Otherwise, your algorithm does not improve asymptotic runtime. – fuz Nov 29 '15 at 17:22
@FUZxxl I still do not understand your comment. As I understand it, my method provides a linear time method to possibly significantly reduce the `O(n^2)` loop. Since you have bound `m` and `w`, I think it reduces that loop to `O(c * n)` where `c` is some constant. – גלעד ברקן Nov 29 '15 at 17:36
The wckd algorithm takes O(cn) where c is the initisl guess. Can you prove, that your algorithm provides a guess c < O(n)? Because if it cannot, it's not an asymptoptic improvement to the naive algorithm. – fuz Nov 29 '15 at 17:42

trincot · Answer 3 · 2015-12-07T05:35:41.140

I looked at the first problem (horizontal fill), and assumed a gap-size (s) of zero, like you suggested in your edit.

First: I know the bounty is over, and I have no proof of an algorithm that does better than O(n²).

However, I do have some ideas to present that might still be of interest.

My proposed algorithm goes as follows:

Get some upper bound of c in O(n) time (I will get to that later)

If c is 0 or 1, or all strings fit on one row then that c is the answer. Stop.

Create an index ss[] on s[] on descending widths, using pigeon hole sort, in O(w+n) (with w = max(s[]), w <= m). One element of ss[] has two attributes: width and seqNo (the original sequence number, as it occurs in s[]).

Then, loop through the widths in decreasing order until we have a width for each column in a c-column configuration. If the sum of these widths is still not greater than m then c is a solution. More formally:

    knownColumnWidths = new Set() of column numbers
    sumWidth = 0
    for i from 0 to n-1:
        colNo = ss[i].seqNo modulo c
        if not colNo in knownColumnWidths:
            sumWidth += ss[i].width
            if sumWidth > m:
                // c is not a solution, exit for-loop
                break
            add colNo to knownColumnWidths
            if knownColumnWidths has c values:
                // solution found
                return c as solution. Stop

If c was rejected as solution, repeat previous code with c = c - 1.

This last part of the algorithm seems O(n²). But, if the for-loop has a worst case performance (i.e. n - c + 1 iterations), then the next few times (c/2) it runs, it will have close to best performance (i.e. close to c iterations). But in the end it still looks like O(n²).

For getting a good upper bound of c (cf. first step above), I propose this:

First fill as many strings on the first row of the terminal without exceeding the limit m, and take that as initial upper limit for c. More formally put:

sumWidth = 0
c = 0
while c < n and sumWidth + s[c] <= m:
    sumWidth += s[c]
    c++

This is clearly O(n).

This can be further improved as follows:

Take the sum of c widths, but starting one string further, and check if this still is not greater than m. Keep doing this shifting. When m is surpassed, decrease c until the sum of c widths is OK again, and continue the shift with the sum of c consecutive widths.

More formally put, with c starting off with the above found upper limit:

for i from c to n - 1:
    if s[i] > m:
        c = 0. Stop // should not happen: all widths should be <= m
    sumWidth += s[i] - s[i - c]
    while sumWidth > m:
        // c is not a solution. Improve upper limit: 
        c--
        sumWidth -= s[i - c]

This means that in one sweep you might have several improvements for c. Of course, in the worst case, it leads to no improvement at all.

This concludes my algorithm. I estimate it will perform well on random input, but still looks like O(n²).

But I have a few observations, which I have not used in the above algorithm:

When you have found the column widths for a certain c, but the total width is greater than m, then this result can still be put to good use for the case c'=c/2. It is then not necessary to go through all string widths. It suffices to take the sum(max(s[i], s[i+c']) for i in 0..c'-1. The same principle holds for other divisors of c.

I did not use this, because if you have to go down from c all the way to c/2 without finding a solution, you already have spent O(n²) on the algorithm. But maybe it can serve a purpose in another algorithm...

When zero-length strings are not allowed, then an algorithm can be made to be O(m.n), as the number of possible solutions (values for c) is limited to m and to determine whether one of these is a solution, requires only one sweep through all the widths.

I have allowed for zero-length strings.

Initially I was looking into a binary search for c, dividing it by 2 and going for one of the remaining halves. But this method cannot be used, because even when a certain c is found to not be a solution, this does not exclude there being a c' > c that is a solution.

Hopefully there is something in this answer you find useful.

score 2 · Answer 4 · edited May 23 '17 at 12:31

An obvious way to solve this problem is to iterate through all string lengths in some predetermined order, update width of the column where each string belongs, and stop when sum of column widths exceeds terminal width. Then repeat this process for decremented number of columns (or for incremented number of rows for "arrange down" case) until success. Three possible choices for this predetermined order:

by row (consider all strings in the first row, then in row 2, 3, ...)
by column (consider all strings in the first column, then in column 2, 3, ...)
by value (sort strings by length, then iterate them in sorted order, starting from the longest one).

These 3 approaches are OK for "arrange across" sub-problem. But for "arrange down" sub-problem they all have worst case time complexity O(n²). And first two methods show quadratic complexity even for random data. "By value" approach is pretty good for random data, but it is easy to find worst case scenario: just assign short strings to first half of the list and long strings - to second half.

Here I describe an algorithm for "arrange down" case that does not have these disadvantages.

Instead of inspecting each string length separately, it determines width of each column in O(1) time with the help of range maximum query (RMQ). Width of the first column is just the maximum value in range (0 .. num_of_rows), width of next one is the maximum value in range (num_of_rows .. 2*num_of_rows), etc.

And to answer each query in O(1) time, we need to prepare an array of maximum values in ranges (0 .. 2^k), (1 .. 1 + 2^k), ..., where k is the largest integer such that 2^k is not greater than current number of rows. Each range maximum query is computed as maximum of two entries from this array. When number of rows starts to be too large, we should update this query array from k to k+1 (each such update needs O(n) range queries).

Here is C++14 implementation:

template<class PP>
uint32_t arrangeDownRMQ(Vec& input)
{
    auto rows = getMinRows(input);
    auto ranges = PP{}(input, rows);

    while (rows <= input.size())
    {
        if (rows >= ranges * 2)
        { // lazily update RMQ data
            transform(begin(input) + ranges, end(input), begin(input),
                      begin(input), maximum
            );

            ranges *= 2;
        }
        else
        { // use RMQ to get widths of columns and check if all columns fit
            uint32_t sum = 0;

            for (Index i = 0; sum <= kPageW && i < input.size(); i += rows)
            {
                ++g_complexity;
                auto j = i + rows - ranges;

                if (j < input.size())
                    sum += max(input[i], input[j]);
                else
                    sum += input[i];
            }

            if (sum <= kPageW)
            {
                return uint32_t(ceilDiv(input.size(), rows));
            }

            ++rows;
        }
    }

    return 0;
}

Here PP is optional, for simple case this function object does nothing and returns 1.

To determine the worst case time complexity of this algorithm, note that outer loop starts with rows = n * v / m (where v is average string length, m is page width) and stops with at most rows = n * w / m (where w is largest string length). Number of iterations in "query" loop is not greater than the number of columns or n / rows. Adding these iterations together gives O(n * (ln(n*w/m) - ln(n*v/m))) or O(n * log(w/v)). Which means linear time with small constant factor.

We should add here time to perform all update operations which is O(n log n) to get complexity of whole algorithm: O(n * log n).

If we perform no update operations until some query operations are done, time needed for update operations as well as algorithm complexity decreases to O(n * log(w/v)). To make this possible we need some algorithm that fills RMQ array with maximums of sub-arrays of given length. I tried two possible approaches and it seems algorithm with pair of stacks is slightly faster. Here is C++14 implementation (pieces of input array are used to implement both stacks to lower memory requirements and to simplify the code):

template<typename I, typename Op>
auto transform_partial_sum(I lbegin, I lend, I rbegin, I out, Op op)
{ // maximum of the element in first enterval and prefix of second interval
    auto sum = typename I::value_type{};

    for (; lbegin != lend; ++lbegin, ++rbegin, ++out)
    {
        sum = op(sum, *rbegin);
        *lbegin = op(*lbegin, sum);
    }

    return sum;
}

template<typename I>
void reverse_max(I b, I e)
{ // for each element: maximum of the suffix starting from this element
    partial_sum(make_reverse_iterator(e),
                make_reverse_iterator(b),
                make_reverse_iterator(e),
                maximum);
}

struct PreprocRMQ
{
    Index operator () (Vec& input, Index rows)
    {
        if (rows < 4)
        { // no preprocessing needed
            return 1;
        }

        Index ranges = 1;
        auto b = begin(input);

        while (rows >>= 1)
        {
            ranges <<= 1;
        }

        for (; b + 2 * ranges <= end(input); b += ranges)
        {
            reverse_max(b, b + ranges);
            transform_partial_sum(b, b + ranges, b + ranges, b, maximum);
        }

        // preprocess inconvenient tail part of the array
        reverse_max(b, b + ranges);
        const auto d = end(input) - b - ranges;
        const auto z = transform_partial_sum(b, b + d, b + ranges, b, maximum);
        transform(b + d, b + ranges, b + d, [&](Data x){return max(x, z);});
        reverse_max(b + ranges, end(input));
        return ranges;
    }
};

In practice there is much higher chance to see a short word than a long word. Shorter words outnumber longer ones in English texts, shorter text representations of natural numbers prevail as well. So I chosen (slightly modified) geometric distribution of string lengths to evaluate various algorithms. Here is whole benchmarking program (in C++14 for gcc). Older version of the same program contains some obsolete tests and different implementation of some algorithms: TL;DR. And here are the results:

For "arrange across" case:

"by column" approach is the slowest
"by value" approach is good for n = 1000..500'000'000
"by row" approach is better for small and very large sizes

For "arrange down" case:

"by value" approach is OK, but slower than other alternatives (one possibility to make it faster is to implement divisions via multiplications), also it uses more memory and has quadratic worst-case time
simple RMQ approach is better
RMQ with preprocessing the best: it has linear time complexity and its memory bandwidth requirements are pretty low.

It may be also interesting to see number of iterations needed for each algorithm. Here all predictable parts are excluded (sorting, summing, preprocessing, and RMQ updates):

This is really neat. Give me some time to understand the core ideas. — fuz, Jan 05 '16 at 02:18

How to efficiently find the ideal column count for strings of a certain width?

Later findings

4 Answers4

Linked