17

For an algorithm competition training (not homework) we were given this question from a past year. Posted it to this site because the other site required a login.

This is the problem: http://pastehtml.com/view/c5nhqhdcw.html

Image didn't work so posted it here:

It has to run in less than one second and I can only think about the slowest way to do it, this is what I tried:

with open('islandin.txt') as fin:
    num_houses, length = map(int, fin.readline().split())
    tot_length = length * 4 # side length of square
    houses = [map(int, line.split()) for line in fin] # inhabited houses read into list from text file

def cost(house_no):
    money = 0
    for h, p in houses:
        if h == house_no: # Skip this house since you don't count the one you build on
            continue
        d = abs(h - house_no)
        shortest_dist = min(d, tot_length - d)    
        money += shortest_dist * p
    return money


def paths():
    for house_no in xrange(1, length * 4 + 1):
        yield house_no, cost(house_no)
        print house_no, cost(house_no) # for testing

print max(paths(), key=lambda (h, m): m) # Gets max path based on the money it makes

What I'm doing at the moment is going through each location and then going through each inhabited house for that location to find the max income location.

Pseudocode:

max_money = 0
max_location = 0
for every location in 1 to length * 4 + 1
    money = 0
    for house in inhabited_houses:
        money = money + shortest_dist * num_people_in_this_house
    if money > max_money
        max_money = money
        max_location = location

This is too slow since it's O(LN) and won't run in under a second for the largest test case. Can someone please simply tell me how to do it in the shortest run time (code isn't required unless you want to) since this has been bugging me for ages.

EDIT: There must be a way of doing this in less than O(L) right?

jamylak
  • 128,818
  • 30
  • 231
  • 230
197
  • 351
  • 3
  • 12
  • 5
    You should detail what your algorithm is suppose to do, code is cool, but natural language is also useful sometimes – AsTeR Jul 22 '12 at 13:25
  • @AsTeR Added a bit more of an explanation of how I did, I'll try to improve it a bit more. – 197 Jul 22 '12 at 13:31
  • Interesting! I'm on that. @197, do you know how to solve in O(L) a simpler problem, where there's a straight street of L houses? – Kos Jul 22 '12 at 14:16
  • OK, I think I've got that! Keyword: Prefix sums (with a twist); do you want more tips? – Kos Jul 22 '12 at 14:59
  • @Kos I haven't looked into a straight line but wouldn't the solution have to be smaller than O(L)? And yes I want more tips! – 197 Jul 22 '12 at 15:03
  • I've posted an answer and I'll expand it part-by-part (unless you really want me to post the whole solution at once?) And I believe O(L) is enough really. – Kos Jul 22 '12 at 15:47
  • I think it's possible to do it in O(N) time. You don't have to consider some of the uninhabited houses (depending on the layout). – zarkon Jul 25 '12 at 08:06

4 Answers4

10

Suppose the list houses is composed of pairs (x,pop) with 0 <= x < 4*L the location and pop the population.

The objective function, which we want to maximize, is

def revenue(i):
    return sum(pop * min((i-j)%(4*L), 4*L - (i-j)%(4*L)) for j,pop in houses)

The naive algorithm O(LN) algorithm is simply:

max_revenue = max(revenue(i) for i in range(4*L))

But it is incredibly wasteful to entirely re-evaluate revenue for each location.

To avoid that, notice that this is a piecewise-linear function; so its derivative is piecewise constant, with discontinuities at two kinds of points:

  • at house i, the derivative changes from slope to slope + 2*population[i]
  • at the point located opposite house i on the island, the derivative changes from slope to slope - 2*population[i]

This makes things very simple:

  1. We only have to examine actual houses or opposite-of-houses, so the complexity drops to O(N²).
  2. We know how to update the slope from house i-1 to house i, and it requires only O(1) time.
  3. Since we know the revenue and the slope at location 0, and since we know how to update the slope iteratively, the complexity actually drops to O(N): between two consecutive houses/opposite-of-houses, we can just multiply the slope by the distance to obtain the difference in revenue.

So the complete algorithm is:

def algorithm(houses, L):
    def revenue(i):
        return sum(pop * min((i-j)%(4*L), 4*L - (i-j)%(4*L)) for j,pop in houses)

    slope_changes = sorted(
            [(x, 2*pop) for x,pop in houses] +
            [((x+2*L)%(4*L), -2*pop) for x,pop in houses])

    current_x = 0
    current_revenue = revenue(0)
    current_slope = current_revenue - revenue(4*L-1)
    best_revenue = current_revenue

    for x, slope_delta in slope_changes:
        current_revenue += (x-current_x) * current_slope
        current_slope += slope_delta
        current_x = x
        best_revenue = max(best_revenue, current_revenue)

    return best_revenue

To keep things simple I used sorted() to merge the two types of slope changes, but this is not optimal as it has O(N log N) complexity. If you want better efficiency, you can generate in O(N) time a sorted list corresponding to the opposite-of-houses, and merge it with the list of houses in O(N) (e.g. with the standard library's heapq.merge). You could also stream from iterators instead of lists if you want to minimize memory usage.

TLDR: this solution achieves the lowest feasible complexity of O(N).

Generic Human
  • 6,027
  • 1
  • 17
  • 10
  • Are you sure this works? `algorithm([(2, 3), (4, 1), (11, 1), (12, 2)], 3)` gives `11` when it should give `33` or is my input wrong? – phant0m Jul 25 '12 at 16:34
  • Further, you write `We only have to examine actual houses or opposite-of-houses`, however, my algorithm is of the opinion that position `8` is ideal, but neither `8` nor its opposite `5` has a house. Or did you mean something else? – phant0m Jul 25 '12 at 16:46
  • In my solution the positions start from 0 and they must satisfy `0 <= x < 4*L`, so you have to replace 12 with 0 or change `(x, 2*pop)` to `(x%(4*L), 2*pop)`. When `L = 3`, the opposite of `8` is `8-2*L = 2`, which does have a house. Also note that sometimes the slope can be 0 so that a whole interval can be optimal. – Generic Human Jul 25 '12 at 16:53
  • Ah I see, I misinterpreted "opposite" the wrong way. Now it makes sense. Basically you simply combine all those steps in my algorithm `W(F)` and `W(B)` don't change. I could incorporate that into my solution, it's even visible in the diagram that these steps can be skipped :) PS: if you include @username, you can notify my (or somebody else) when you've made a new comment. – phant0m Jul 25 '12 at 19:25
  • Thanks, I think I mostly understand this solution except for the discontinuities part. Is there any way you could explain it to me a bit more simply? I don't see why it is `2 * population`, what is the `2` for? I think I can see how the slope works but just not sure about discontinuities. – 197 Jul 28 '12 at 12:34
  • @197: Let's say 3 people live in a house. When the house gets further away, you add `3` for each step, so the slope is `3`. Once it starts getting closer, you subtract `3` for each step. Thus, the slope changes from `+3` to `-3`, i.e. it changes by `2 * population`. A discontinuity is basically when a function "jumps" (in this case from `+3` to `-3`, there are no values in between.) – phant0m Jul 28 '12 at 15:14
9

Here is a less mathematically inclined solution that works in O(n).

Let us partition the houses (indexing starts at 0) into two disjoints sets:

  • F, "front", where people walk CCW to the house
  • B, "back", where people walk CW to the house

and a single house p that marks the current position where the plant would be built.

I have based my illustration on the example given in the image.

By convention, lets assign half the houses to F, and exactly one less to B.

  • F contains 6 houses
  • B contains 5 houses

With simple modular arithmetic, we can easily access the houses by (p + offset) % 12 thanks to Python's sane implementation of the modulo operator, quite unlike some other popular languages.

If we arbitrarily choose a position for p, we can determine the consumption of water in O(L) trivially.

We could do this all over again for a different position of p to arrive at a runtime of O(L^2).

However, if we only shift p by one position, we can determine the new consumption in O(1) if we make a somewhat clever observation: The amount of people living in F (or B respectively) determine how much the consumption of F changes when we set p' = p+1. (and some corrections because F itself will change). I have tried to depict this here to the best of my abilities.

algorithm depiction

We end up with a total running time of O(L).

The program for this algorithm is at the end of the post.

But we can do better. As long as no houses change between the sets, the cs and ws that are added will be zero. We can calculate how many of these steps there are and do them in one step.

Houses change sets when: - When p is on a house - When p is opposite of a house

In the following diagram, I have visualized stops the algorithm now makes to update the Cs and Ws. Highlighted is the house, that causes the algorithm to stop.

optimized algorithm

The algorithm begins at a house (or the opposite of one, we'll see why later), in this case that happens to be a house.

Again, we have both a consumption C(B) = 3*1 and C(F) = 2 * 1. If we shift p to the right by one, we add 4 to C(B) and subtract 1from C(F). If we shift p once again, the exact same thing happens.

As long as the same two sets of houses move closer and further away respectively, the changes to the Cs are constant.

We now change the definition of B slightly: It will now also contain p! (This does not change the above paragraphs regarding this optimized version of the algorithm).

This is done because when we move to the next step, we will add the weight of the houses that are moving away repeatedly. The house at the current position is moving away when p moves to the right, thus W(B) is the correct summand.

The other case is when a house stops moving away and comes closer again. In that case the Cs change drastically because 6*weight goes from one C to the other. That is the other case when we need to stop.

new calculations

I hope it is clear how and why this works, so I'll just leave the working algorithm here. Please ask if something is not clear.

O(n):

import itertools

def hippo_island(houses, L):
    return PlantBuilder(houses, L).solution

class PlantBuilder:
    def __init__(self, houses, L):
        self.L = L
        self.houses = sorted(houses)
        self.changes = sorted(
            [((pos + L /2) % L, -transfer) for pos, transfer in self.houses] + 
            self.houses)
        self.starting_position = min(self.changes)[0]

        def is_front(pos_population):
            pos = pos_population[0]
            pos += L if pos < self.starting_position else 0
            return self.starting_position < pos <= self.starting_position + L // 2

        front_houses = filter(is_front, self.houses)
        back_houses = list(itertools.ifilterfalse(is_front, self.houses))

        self.front_count = len(houses) // 2
        self.back_count = len(houses) - self.front_count - 1
        (self.back_weight, self.back_consumption) = self._initialize_back(back_houses)
        (self.front_weight, self.front_consumption) = self._initialize_front(front_houses)
        self.solution = (0, self.back_weight + self.front_weight)
        self.run()

    def distance(self, i, j):
        return min((i - j) % self.L, self.L - (i - j) % self.L)

    def run(self):
        for (position, weight) in self.consumptions():
            self.update_solution(position, weight)

    def consumptions(self):
        last_position = self.starting_position
        for position, transfer in self.changes[1:]:
            distance = position - last_position
            self.front_consumption -= distance * self.front_weight
            self.front_consumption += distance * self.back_weight

            self.back_weight += transfer
            self.front_weight -= transfer

            # We are opposite of a house, it will change from B to F
            if transfer < 0:
                self.front_consumption -= self.L/2 * transfer
                self.front_consumption += self.L/2 * transfer


            last_position = position
            yield (position, self.back_consumption + self.front_consumption)

    def update_solution(self, position, weight):
        (best_position, best_weight) = self.solution
        if weight > best_weight:
            self.solution = (position, weight)

    def _initialize_front(self, front_houses):
        weight = 0
        consumption = 0
        for position, population in front_houses:
            distance = self.distance(self.starting_position, position)
            consumption += distance * population
            weight += population
        return (weight, consumption)

    def _initialize_back(self, back_houses):
        weight = back_houses[0][1]
        consumption = 0
        for position, population in back_houses[1:]:
            distance = self.distance(self.starting_position, position)
            consumption += distance * population
            weight += population
        return (weight, consumption)

O(L)

def hippo_island(houses):
    return PlantBuilder(houses).solution

class PlantBuilder:
    def __init__(self, houses):
        self.houses = houses
        self.front_count = len(houses) // 2
        self.back_count = len(houses) - self.front_count - 1
        (self.back_weight, self.back_consumption) = self.initialize_back()
        (self.front_weight, self.front_consumption) = self.initialize_front()
        self.solution = (0, self.back_weight + self.front_weight)
        self.run()

    def run(self):
        for (position, weight) in self.consumptions():
            self.update_solution(position, weight)

    def consumptions(self):
        for position in range(1, len(self.houses)):
            self.remove_current_position_from_front(position)

            self.add_house_furthest_from_back_to_front(position)
            self.remove_furthest_house_from_back(position)

            self.add_house_at_last_position_to_back(position)
            yield (position, self.back_consumption + self.front_consumption)

    def add_house_at_last_position_to_back(self, position):
        self.back_weight += self.houses[position - 1]
        self.back_consumption += self.back_weight

    def remove_furthest_house_from_back(self, position):
        house_position = position - self.back_count - 1
        distance = self.back_count
        self.back_weight -= self.houses[house_position]
        self.back_consumption -= distance * self.houses[house_position]

    def add_house_furthest_from_back_to_front(self, position):
        house_position = position - self.back_count - 1
        distance = self.front_count
        self.front_weight += self.houses[house_position]
        self.front_consumption += distance * self.houses[house_position]

    def remove_current_position_from_front(self, position):
        self.front_consumption -= self.front_weight
        self.front_weight -= self.houses[position]

    def update_solution(self, position, weight):
        (best_position, best_weight) = self.solution
        if weight > best_weight:
            self.solution = (position, weight)

    def initialize_front(self):
        weight = 0
        consumption = 0
        for distance in range(1, self.front_count + 1):
            consumption += distance * self.houses[distance]
            weight += self.houses[distance]
        return (weight, consumption)

    def initialize_back(self):
        weight = 0
        consumption = 0
        for distance in range(1, self.back_count + 1):
            consumption += distance * self.houses[-distance]
            weight += self.houses[-distance]
        return (weight, consumption)

Result:

>>> hippo_island([0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 1, 2])
(7, 33)
phant0m
  • 16,595
  • 5
  • 50
  • 82
  • 1
    what kind of software did you use to make the illustration? – Arnab Datta Jul 25 '12 at 21:26
  • 1
    @ArnabDatta I have used [this online LaTeX editor](http://www.codecogs.com/latex/eqneditor.php) and some primitive brush strokes in Photoshop. – phant0m Jul 26 '12 at 00:09
  • I'm not familiar with Python syntax but there is for loop in "run" and in "consumptions" methods, does it mean that it's n^2 ? – Rrr Jul 27 '12 at 23:11
  • 2
    No, it's not `n^2`. Both loops run only once. In fact, the loop inside `consumptions()` controls how many times the loop in `run()` runs, because `consumptions()` [generates](http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained) the values over which the loop in `run()` iterates. – phant0m Jul 27 '12 at 23:47
  • @phant0m Thanks to improve my posts, I just saw in my notification box..And Nice answer +1. – Grijesh Chauhan Jan 07 '13 at 14:00
  • @GrijeshChauhan You're welcome and thanks to you, too. Your posts are quite good ;) Sidenote: You don't need to @-highlilght me when you comment on my own posts ;) – phant0m Jan 07 '13 at 14:08
  • @phant0m Thanks specially for last post.. You know I was beaten by my teacher several time for my bad English :P – Grijesh Chauhan Jan 07 '13 at 14:11
  • @GrijeshChauhan I hope not... By the way, it's not necessary or expected of you to leave a "thank you" comment beneath every post me or someone else edits. Comments are there to discuss the posts (or for you to complain about bad edits :P), ask questions etc. To return the favor, you can look at the person's answers instead and upvote, just like you did now ;) – phant0m Jan 07 '13 at 14:21
0

I'll provide some tips so that you can still have some challenge for yourself.


Let me start with a heavily simplified version:

There are N houses on a straight street, and either is either populated or empty.

0 1 1 0 1

Let's calculate the score for them, knowing that the n-th house has a score equal to the sum of all distances to other houses which are non-empty. So the score of first house is 1+2+4 = 7, since there are 3 other populated houses and they are in distances 1, 2, 4.

The full array of scores looks like:

7 4 3 4 5

How to calculate that? The obvious approach is...

for every house i
    score(i) = 0
    for every other house j
        if j is populated, score(i) += distance(i, j)

This gives you O(N^2) complexity. However there's a quicker way that calculates all the scores in O(N), because it doesn't feature a nested loop. It's related to prefix sums. Can you find it?

Kos
  • 70,399
  • 25
  • 169
  • 233
0

There is no need to calculate every house!!!

It's not fully developed, but I think it is worth thinking about it:

modulo N

N is the number of all houses, n shall be the "adress" (number) of some of the houses.

If you walk around the island, you will find that n is raising by 1 for each house you pass. If you reach the house where n is N, then the next house has the number 1.

Let us use a different system of numbering: increase every house-number by 1. Then n goes from 0 to N-1. this is the same way how numbers modulo N will behave.

Litres is a function of the house-number n (modulo N)

You can calculate the amount of liters for each house-Number by building the sum of all products of distance and people living there.

You can also draw a graph of that function: x is n, y is the number of litres.

the function is periodic

If you understand what modulo means, you will understand that the graph you just did draw is just one periode of a periodic function, since Litre(n) ist eaqual to Litre(n + x * N) where x is a integer (that might be negative too).

if N is big, the function is "Pseudo-continuous"

What I mean is this: If N is really big, then the amount of litres will not change very much if you move from house a to its neighbour, house a+1. So you can use methods of interpolation.

you are looking for the place of the "global" maximum of a periodic pseudo-continuous function (only really global within one periode)

This is my suggestion:

Step 1: select a distance d that is bigger than 1 and smaller than N. I can't say why, but I would use d=int(sqrt(N)) (maybee there might be a better choice, try it out).
Step 2: calculate the Litres for House 0, d, 2d, 3d, ... Step 3: you will find some values that are higher than both of their neighbours. Use this high-points and their neighbours feed them using a method of interpolation to calculate more points close to that a high points (interval-splitting).

Repeat this interpolations for other high points as long as you have time (you have 1 second, which is a long time!)

Jump from one high-point to another if you see, that the global maximum must be elsewhere.

Hubert Schölnast
  • 8,341
  • 9
  • 39
  • 76