1

This is an interview question: How to process a integer matrix to find the average for the elements of any sub-rectangle in O(1)? As the comments say, we should calculate the prefix sums.

Well, it is embarrassing but even having asked a similar question I did not get it.

Community
  • 1
  • 1
Michael
  • 41,026
  • 70
  • 193
  • 341
  • 1
    Did you understand @Sven Marnach's solution for the 1-dimensional case? – Aasmund Eldhuset Mar 05 '11 at 15:19
  • Yes, I understand that having calculated prefix sums of the array (b[i] = a[0] + ... + a[i]), I can easily get the sum (and hence the average) of any interval, i.e. sum of elements in interval [i,j] = b[j] - b[i] – Michael Mar 05 '11 at 15:38
  • You want a Summed Area Table. See this question: http://stackoverflow.com/questions/2277749/calculate-the-sum-of-elements-in-a-matrix-efficiently The average is obviously just the sum divided by the number of elements. – Alan Mar 05 '11 at 16:40

3 Answers3

4

Though you have a couple of reasonable answers, this seems like a question for which a picture is worth several thousand words.

enter image description here

Okay -- what we want is the area of rectangle 4. Our pre-computed area, however, only tells use the area of the whole rectangle starting from the origin at the top left, and extending to the bottom-right corer of 4. To compute only the area of 4, we have to subtract the areas above and to the left (1, 2, & 3). The areas of 2 & 3 are the same way though: we can only get their areas starting from the top left and extending to their bottom right. That means we can get the sum of their areas, but the area of rectangle 1 is included in both of those.

So, to get the area of rectangle 4, we find the area at its bottom right corner. We find the area at the bottom-right corners of 2 and 3 and add them together. We then subtract the area at the bottom-right corner of rectangle 1 since it was included in our total twice. That gives us the total area of rectangles 1, 2 and 3. We subtract that from the area for the bottom-right corner of rectangle 4, and that gives us the area of rectangle 4 by itself.

Just for example, let's assume I managed to get all those rectangles the same size, so we'll call the area for each '1'. So, the area given at the bottom-right corner of each rectangle will be:

  1. 1
  2. 2
  3. 2
  4. 4

To keep things simple, let's define each of these as pc_area(x) (i.e. precomputed-area we'd get at the bottom-right corner of rectangle x).

So, we take: pc_area(4) - (pc_area(2) + pc_area(3) - pc_area(1)) Substituting the numbers, we get: 4 - (2 + 2 -1), or 4-3, which obviously gives 1, the answer we originally defined as correct.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
3

See @Sven Marnach'answer to the post @Michael linked to. The prefix sum algorithm he outlines is based on the following idea: if you have precomputed the sums of the following element sequences: [0, 0], [0, 1], [0, 2], ..., [0, n-1], you can compute the sum of the sequence [a, b] in constant time as [0, b] - [0, a-1]. The same idea can be applied to two-dimensional arrays. Denote the subarray with its upper left corner at (a, b) and its lower right corner at (c, d) as [(a, b), (c, d)]. We will treat such a subarray similarly to how we treated sequences in the 1-D case. Precompute the sums of all subarrays having their upper left corner at (0, 0): [(0, 0), (0, 0)], [(0, 0), (0, 1)], [(0, 0), (0, 2)], ..., [(0, 0), (0, m-1)], [[(0, 0), (1, 0)], [(0, 0), (1, 1)], ..., [(0, 0), (1, m-1)], ..., [(0, 0), (n-1, m-1)]. There are n*m such subarrays, and the sum of each can be computed in constant time based on the sums of smaller subarrays. If we are now asked to produce the sum of the subarray [(a, b), (c, d)], we can find it as [(0, 0), (c, d)] - [(0, 0), (a-1, d)] - [(0, 0), (c, b-1)] + [(0, 0), (a-1, b-1)]. Draw it on paper, and you'll see how the subarrays overlap and why we need to add the last subarray.

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
  • 1
    In more geometric terms, after you've figured out how to calculate the sum of any rectangle that starts from the top-left (this can be done with the subarray-sum thing), you can figure out the sum of an arbitrary rectangle by subtracting and adding rectangles that start from the top-left. ...or just draw it on paper. – Matti Virkkunen Mar 05 '11 at 15:49
  • @Matti Virkkunen: +1 for being less verbose than me. :-) – Aasmund Eldhuset Mar 05 '11 at 15:51
1

I think you are misleaded by the assumption that you just get a matrix / list of nubmers.

In this case the requirement is not possible. The fastest you could do is an heuristic algorithm that is based on a sorting algorithm and you could achieve O(Log(n)).

This however will only be useful if there is an error tolerance and the distribution of values is not distored.

But you can very well solve your problem if you have control over the data structure. The common teqnique to do that is called prefix sums.

read up on it on wikipedia, they explain it better than i ever could.

But let me provide a very basic trivial analogy that may solve confusion.

You cannot search an element in a list in O(1). Log(n) is the best you can do, if its sorted or you sort it before.

However if you have control over the data structure you can build a hash index where you can directly pick it and have O(1).

Prefix sums is something similar. Its a datastructure that solves your problem, not an algorithm (well not of course its an algorithm but more on the other side of the line).

The Surrican
  • 29,118
  • 24
  • 122
  • 168
  • Based on the response to the other question, I think the OP is allowed to use a custom data structure to increase the performance of the queries after preprocessing. – Jeremiah Willcock Mar 05 '11 at 15:33
  • The time taken for preprocessing doesn't count towards the O(1), you only have to be able to find out answers in O(1) time after preprocessing. I can think of a preprocessed data structure that takes the same amount of memory and can answer the question in O(1). – Matti Virkkunen Mar 05 '11 at 15:36
  • thats correct! if the time of preprocessing doesnt count. which is a very valid situation because it can done one time for N number of runs. – The Surrican Mar 05 '11 at 16:07