3

Heyo all.

Trying to get better at python and started doing leetcode problems. Im currently doing one, were the goal is to capture water. Link => https://leetcode.com/problems/trapping-rain-water/

problem is; it times me out for taking too long. My code is certainly inefficient. Afer googling around i found that .append is supposedly very slow / inefficient. So is .extend.

Cant find any obvious ways of making my code faster; hence my arrival here.

any response is much appreciated

class Solution:
    def trap(self, height: List[int]) -> int:
        
        max_height = max(height)
        water_blocks = 0
        for element in range(max_height):
            local_map = []
            element = element + 1
            for block in height:
                if block >= element:
                    local_map.extend([1])
                else:
                    local_map.extend([0])

            if local_map.count(1) > 1:
                first_index = local_map.index(1)
                reversed_list = local_map[::-1]
                last_index = len(local_map) - 1 - reversed_list.index(1)
                water_count = last_index - first_index - 1 - (local_map.count(1) - 2)
                water_blocks += water_count
            else:
                continue
        return water_blocks
  • Try using a list comprehension. That may pre-allocate the list all at once. – Barmar Jan 26 '22 at 21:51
  • @Barmar it doesn't, it uses `list.append`, more importantly, `.append` and `.extend` won't be the problem – juanpa.arrivillaga Jan 26 '22 at 21:56
  • 5
    " Afer googling around i found that .append is supposedly very slow / inefficient. So is .extend." This is totally incorrect. `.append` is amoritized constant time. It is *very* efficient, and almost certainly not the bottleneck. Much more suspicious are all the O(N) calls to `index` and `count`, as well as the reversals `[::-1]`, all of these are O(N). Although note, `local_map.extend([1])` is a silly way of doing `local_map.append(1)` – juanpa.arrivillaga Jan 26 '22 at 21:58
  • @juanpa.arrivillaga I was hoping that it might do something like `internal_result = [dummy_value] * len(iterator)` and then assign in place, rather than use `append()` internally, in cases where the length can be determined. – Barmar Jan 26 '22 at 22:03
  • 1
    @Barmar it doesn't. just check out `dis.dis`, there is a special opcode, LIST_APPEND that is use precisely to create list comprehensions. In any case, again, `list.append` is definitely not the performance problem here. – juanpa.arrivillaga Jan 26 '22 at 22:08
  • See https://stackoverflow.com/questions/45893768/how-do-i-find-out-what-parts-of-my-code-are-inefficient-in-python first. – mkrieger1 Jan 26 '22 at 22:12
  • @juanpa.arrivillaga i initially did .append but it had the same speed. Even tried making a list of [0] * len(height) and replace instead, but it changed the initial list so i had to .copy it which kinda defeats the purpose as it has to copy it max(height) amount of times either way. Might be the reversals. Was trying to find the first and last occurrence of the number 1. –  Jan 26 '22 at 22:13
  • 2
    @Salomon **again** it is *definitely not the `.append` or the `.extend`*. Your fundamental assumption in the question is just wrong. I don't know where you got that information, but it simply isn't true for CPython `list` objects, which have a highly tuned `list.append` algorithm that pretty much gives you amortized constant time append. But yes, as I mentioned, you have other operations, `.count`, `.index`, and `[::-1]`which are O(N). I'd start looking there. – juanpa.arrivillaga Jan 26 '22 at 22:15
  • Also see https://codereview.stackexchange.com/questions/225467/trapping-rain-water-in-python for some ideas. – mkrieger1 Jan 26 '22 at 22:16
  • @juanpa.arrivillaga gotcha. So the question would be something like; why is it inefficient and be it the reversals - is there a smart(efficient) way to find the first and last occurrence of a number? –  Jan 26 '22 at 22:16
  • @Salomon look, the whole point of leetcode is to get you to *find a good algorithm*. As it stands, this question is not really on-topic for stackoverflow without making it. self-contained question that is narrowly about doing something efficiently, see [ask] and the [help]. not to be too strident about it, but this is not a leetcode solution service. You should google the problem and look at different algorithm approaches to it if you are interested in improving your solution – juanpa.arrivillaga Jan 26 '22 at 22:19
  • @juanpa.arrivillaga Fair point. My goal isn't to find an easy solution to a specific problem but rather to figure out what part of my approach is //shit// so that I do not continue making the same mistake regardless of what I am working on. –  Jan 26 '22 at 22:23
  • I don't think this is an append/extend thing being slow. You have a nested for loop over `range(max_height)` and then over `height`. The problem states that the maximum values are 100,000 and 20,000 items respectively. So that's up to 2 billion times you're doing the nested iteration. That's why your code is being terminated for taking too long. A cursory glance at the problem and I reckon you could do it in a single loop over `height` (20,000 iterations max). – Dunes Jan 26 '22 at 22:39
  • @Dunes -- It can't be done in a single loop. When you get to a tall wall, you can't tell whether it will store water until you know whether there's another tall wall later. – Tim Roberts Jan 26 '22 at 23:02
  • I misspoke, I meant the problem has O(n) complexity and shouldn't need nested for loops. That is, once you've computed how much water is trapped between two walls, you never need to look at that part of the height map again. @TimRoberts – Dunes Jan 26 '22 at 23:50

3 Answers3

2

Although many of your count and index calls can be avoided, the two big nested loops might still be a problem. For the outer loop, max_height can be large number and the inner loop iterates over the full list. You might need to come up with a different algorithm.

I don't have a leetcode account, so I can't really test my code, but this would be my suggestion: It iterates over the height-list only once, with a small inner loop to find the next matching wall.

class Solution:
    def trap(self, h):
        water = 0
        current_height = 0
        for i, n in enumerate(h):
            # found a "bucket", add water
            if n < current_height:
                water += current_height - n
            else: # found a wall. calculate usable height
                current_height = self.n_or_max(h[i+1:], n)
        return water

    def n_or_max(self, h, n):
        local_max = 0
        for i in h:
            if i > local_max:
                local_max = i
                # that's high enough, return n
                if i >= n:
                    return n
        return local_max
Wups
  • 2,489
  • 1
  • 6
  • 17
  • 1
    FWIW, this is the winner. My code was taking 18 seconds on the long sample, yours takes 3ms. I tip my hat to you. – Tim Roberts Jan 27 '22 at 00:28
1

Here are some pointers:

  • Do not use list.count() or list.index() (that is, try to remove local_map.count(1), local_map.index(1) and reversed_list.index(1)). The first will loop (internally) over the whole list, which is obviously expensive if the list is large. The second will loop over the list until a 1 is found. Currently you even have two calls to local_map.count(1) which will always return the same answer, so at least just store the result in a variable. In your loop over blocks, you construct local_map yourself, so you do in fact know exactly what it contains, you should not have to search through it afterwards. Just put a few ifs into the first loop over blocks.
  • The operation local_map[::-1] not only runs over the whole list, but additionally copies the whole thing into a new list (backwards, but that's not really contributing to the issue). Again, this new list does not contain new information, so you can figure out the value of water_count without doing this.
  • The above is really the major issues. A slight further optimization can be obtained by eliminating element = element + 1. Just shift the range, as in range(1, max_height + 1).
  • Also, as written in the comments, prefer list.append(x) to list.extend([x]). It's not huge, but the latter has to create an additional list (of length 1), put x into it, loop over the list and append its elements (just x) to the large list. Finally, the length-1 list is thrown away. On the contrary, list.append(x) just appends x to the list, no temporary length-1 list needed.

Note that list.append() is not slow. It's a function call, which is always somewhat slow, but the actual data operation is fast: constant time and even cleverly amortized, as juanpa.arrivillaga writes.

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
  • *"The above is really the major issues"* - Absolutely not. The real major issue is that it's a terrible algorithm. I'd be surprised if fixing those "major issues" the way you suggest even make the code just twice as fast. For some of the suggestions I wouldn't be surprised if they made it *slower*. A good *algorithm* on the other hand will likely make it *thousands* of times faster. – Kelly Bundy Jan 27 '22 at 00:09
  • Reducing several O(N) to O(1) will not make the code slower, for reasonable big input, which is what leetcode uses. Given a large enough problem size, it can make it arbitrarily faster. Sure a better algorithm altogether might exist, but then we don't optimize the code in the question, but rather implement a whole new solution. – jmd_dk Jan 27 '22 at 00:13
  • I didn't say all of them make it slower or that overall they'll make it slower. Just that some might. The ones where you propose to replace relatively fast O(N) operations with *"Just put a few ifs into the first loop over blocks"*, which also costs O(N) and might be slower. – Kelly Bundy Jan 27 '22 at 00:21
  • For example, doing `count += 1` a million times [takes over three times as long](https://tio.run/##bY7BCsIwDIbve4rc1qCTll1E6ZOIjCqd9tC0hA7q09dWB4r4QyAf5EsSH@keaNxHLmXm4CE5b10C52PgtFLXzYFhAkfAhm5WjHjooCayoyTeQ6K/hoUSbDSofgsraZAVaPEXy1pNUspWiH90s3spImPTc1WVbDmCqf1pGDL8vPHZd/46gljKEw) as making `list.count` do it. – Kelly Bundy Jan 27 '22 at 01:05
  • Sure. Now what if you include the time it takes to add the 1 to the list over and over, necessary for constructing the list before doing list.count()? More importantly, that comparison is only valid if all elements within the list is a `1`. Say the list has a length of one million but only contains a single `1`, then you should use `number=1` for the fist timing as well. – jmd_dk Jan 27 '22 at 01:25
  • You mean if you don't build the list at all? That's not what your answer is saying. But yes, then those "few ifs" are more likely to be beneficial. I'd have to see it to be able to estimate/measure it. And yes, I neglected timeit's overhead. [Unrolling the loop](https://tio.run/##bY7LCsIwEEX3/Yq7S6KtTOhGLP0SkVKl1SzyYEghfn2MtQsR72pmOHNmwjM@vGuPgXOe2VtEYycTYWzwHLeuqmbPGGAceHT3SbbqVKEksHFRfiApbn5xEfseuhPYQRPV2IY9SNRwi71O3GsaiEipP4rxsPIyqYKLVPaKpaTDWOpz0yT8vKLfrtV3@bqgVM4v) does have an effect, but `count += 1` is still almost three times slower than `list.count`. – Kelly Bundy Jan 27 '22 at 01:32
  • *"More importantly, that comparison is only valid if all elements within the list is a `1`"* - Ha. Silly me. I had actually done it with `1` at first, recognized that it gives `list.count` a huge advantage and changed it to counting `x` in a list of a million `x` duplicates. I forgot that the OP's code does use just `1`. So `list.count` is actually about [**13 times** faster](https://tio.run/##bYy9CgIxEIT7e4rpkughG64RJU8ickTJaYr8sOQKnz4meoWFA1vM8s2XX@WZ4nTMXOvCKaD44HyBDzlx2dowLIkxw0ewjQ8nJ3Ua0JLZxyK/kBT3tMaCvYE@C@ygiUZsTwMSI@Iabo6NppmIlPqjsIcPL7VquLBtd9HX7uqLfj8WpWp9Aw) than `count += 1` :-D – Kelly Bundy Jan 27 '22 at 01:41
  • Hmm, actually that means it's not a worst case for `list.count` as intended. I shall revisit this after sleeping – Kelly Bundy Jan 27 '22 at 03:38
  • Ok... worst case for `count += 1` is if it has to be done all the time, and worst case for `list.count(1)` is if there is *no* `1`. Then `list.count` seems [about 3.2 times faster](https://tio.run/##dYxBCsIwFET3PcXskmiRH7oRpScRKVFSzSI/4ZMuevqYqgs3DsxmmPfyWp6Jh2OWWmdJESVEHwpCzEkKxGfvStfNSTAhMMTxw@vBnDq0ZAlcdAysP0et7mnhgv0Ie1bYwRL1@I4jSPXgJd68jJYmIjLG/PG4wxvS1jRGuQZf6LoJN2zrj6pZan0B). If you could show the actual code you're thinking of, I'd love to do benchmarks with the whole solution rather than just something this isolated :-) – Kelly Bundy Jan 27 '22 at 12:00
  • @KellyBundy I'm talking generally. Regardless of how many `1`s the `list` contain, `list.count(1)` takes `O(n)` time. As these `1`s are deliberately inserted into the list to begin with, just keeping a tally of them is generally faster for large `list`s. – jmd_dk Jan 27 '22 at 12:31
  • Yes, but in the worst case, which is what matters for the time limit, `count += 1` also takes O(n) time. And if "on average", it has to be done for half of the elements, it'll still be 1.6 times slower. And yes, building the list perhaps takes more time than all the `count += 1`, but the list also allows to find the two indexes quickly. And like I said, not building the list isn't what your answer is saying. It actually rather says to *keep* building the list, by suggesting to use `append` and saying that it's *not* slow. – Kelly Bundy Jan 27 '22 at 13:33
1

Here's another way of looking at the problem. This scans left to right over the bins, and at each point, I track how many units of water are dammed up at each level. When there's a tall wall, we tally up whatever units it was damming, and clear them. However, this still gets an "overtime" flag on the next to the last test, which has about 10,000 entries. It takes 20 seconds on my relatively old box.

class Solution():
    def trap(self, height):
        trapped = 0
        accum = [0]*max(height)
        lastwall = -1

        for h in height:
            # Take credit for everything up to our height.
            trapped += sum(accum[0:h])
            accum[0:h] = [0]*h
            for v in range(h,lastwall):
                accum[v] += 1
            lastwall = max(lastwall,h)

        return trapped

print(Solution().trap([0,1,0,2,1,0,1,3,2,1,2,1])) # 6
print(Solution().trap([4,2,0,3,2,5])) # 9
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30