Is there a better way than a while loop to perform this function?

Question

I was attempting some python exercises and I hit the 5s timeout on one of the tests. The function is pre-populated with the parameters and I am tasked with writing code that is fast enough to run within the max timeframe of 5s.

There are N dishes in a row on a kaiten belt, with the ith dish being of type Di. Some dishes may be of the same type as one another. The N dishes will arrive in front of you, one after another in order, and for each one you'll eat it as long as it isn't the same type as any of the previous K dishes you've eaten. You eat very fast, so you can consume a dish before the next one gets to you. Any dishes you choose not to eat as they pass will be eaten by others. Determine how many dishes you'll end up eating.

Issue

The code "works" but is not fast enough.

Code

The idea here is to add the D[i] entry if it is not in the pastDishes list (which can be of size K).

from typing import List
# Write any import statements here

def getMaximumEatenDishCount(N: int, D: List[int], K: int) -> int:
  # Write your code here
  numDishes=0
  pastDishes=[]
  i=0
  while (i<N):
    if(D[i] not in pastDishes):
      numDishes+=1
      pastDishes.append(D[i])
      if len(pastDishes)>K:
        pastDishes.pop(0)
    i+=1
  return numDishes

Is there a more effective way?

Consider using `set` not `list` for membership testing, if you don't need ordering. — Passerby, Jan 04 '22 at 00:32
You'll have to do `return numDishes, 0` instead of the two return statements. The way you have it if you need to `return 0` it won't get to that line. Or you can omit the second return statement if you don't actually need it. — stefan_aus_hannover, Jan 04 '22 at 00:33
Also look into `collections.deque` for storing the last K dishes. Removing the 0th index in a list is an O(n) operation. — rchome, Jan 04 '22 at 00:33
@stefan_aus_hannover thanks, I meant to delete it before copying and pasting. — Don Shappelle, Jan 04 '22 at 01:01
The [sample](//stackoverflow.com/help/mcve) should include sample data to be complete. — outis, Jan 04 '22 at 07:15

ZXYNINE · Answer 1 · 2022-01-04T12:06:47.720

2

After much trial and error, I have finally found a solution that is fast enough to pass the final case in the puzzle you are working on. My previous code was very neat and quick, however, I have finally found a module with a tool that makes this much faster. Its from collections just as deque is, however it is called Counter.

This was my original code:

def getMaximumEatenDishCount(N: int, D: list, K: int) -> int:
    numDishes=lastMod=0
    pastDishes=[0]*K
    for Dval in D:
        if Dval in pastDishes:continue
        pastDishes[lastMod] = Dval
        numDishes,lastMod = numDishes+1,(lastMod+1)%K
    return numDishes

I then implemented Counter like so:

from typing import List

# Write any import statements here
from collections import Counter

def getMaximumEatenDishCount(N: int, D: 'list[int]', K: int) -> int:
    eatCount=lastMod = 0
    pastDishes=[0]*K

    eatenCounts = Counter({0:K})
    for Dval in D:
        if Dval in eatenCounts:continue
        eatCount +=1
        eatenCounts[Dval] +=1

        val = pastDishes[lastMod]
        if eatenCounts[val] <= 1:   eatenCounts.pop(val)
        else:                       eatenCounts[val] -= 1
        
        pastDishes[lastMod]=Dval
        lastMod = (lastMod+1)%K
    return eatCount

Which ended up working quite well. I'm sure you can make it less clunky, but this should work fine on its own.

Some Explanation of what i am doing:

Typically while loops are actually marginally faster than a for loop, however since I need to access the value at an index multiple times if i used it, using a for loop I believe is actually better in this situation. You can see i also initialised the list to the max size it needs to be and am writing over the values instead of popping+appending which saves alot of time. Additionally, as pointed out by @outis, another small improvement was made in my code by using the modulo operator in conjunction with the variable which removes the need for an additional if statement. The Counter is essentially a special dict object that holds a hashable as the key and an int as the value. I use the fact that lastMod is an index to what would normally be accesed through list.pop(0) to access the object needed to either remove or decrement in the counter

Note that it is not considered 'pythonic' to assign multiple variable on one line, however I believe it adds a slight performance boost which is why I have done it. This can be argued though, see this post.

If anyone else is interested the problem that we were trying to solve, it can be found here: https://www.facebookrecruiting.com/portal/coding_puzzles/?puzzle=958513514962507

edited Jan 04 '22 at 12:06

answered Jan 04 '22 at 00:34

ZXYNINE

712
4
5

there may be repeats, per the instructions, just as long as the current dish is not one of the past K dishes. – Don Shappelle Jan 04 '22 at 01:03
I believe I posted that before the asker edited the question to include the extra information. – ZXYNINE Jan 04 '22 at 01:09
@DonShappelle Check out my updated responce and let me know if that works for you. deque is great for when you need to pop items from a list, however for this you do not need to alter the list in any way. – ZXYNINE Jan 04 '22 at 01:18
1

the added piece is that, if the dish value (int) is equal to a value of one of the last K dishes, it should not be eaten. If K is 2, for example, and the list D is 1,2,3,4,3,2,1,1, I would eat 1,2,3,4,2,1 so count would be 6. – Don Shappelle Jan 04 '22 at 01:20
@DonShappelle right, i missed that. All you would need to do is change the set to a list and do the same as you did in your original answer then, `if len(pastDishes)>K: pastDishes.pop(0)`. This may be quite close to your original code, however by using a for loop instead of hacking together one using a while loop, it should provide some additional speed since python bultins are written in c and are far faster than python code itself – ZXYNINE Jan 04 '22 at 01:24
@DonShappelle Also since now you do need to use pop i would definitly just use a deqe instead. All you need to do that is to put `from collections import deque` at the top of your script and use a deque instead of the list or set you are popping from. See my edited code above – ZXYNINE Jan 04 '22 at 01:27
your code fails 3 of the 33 tests due to time limit. My code fails 1 of the 33 tests due to time limit. Maddening! D can be between 1-1000000 and N can be between 1-500000. – Don Shappelle Jan 04 '22 at 01:43
@DonShappelle That truly is maddening lol, have you tried both versions? The Deque and the list comprehension? – ZXYNINE Jan 04 '22 at 01:45
I'll try the lc soon, thanks for banging your head against the wall with me! :) – Don Shappelle Jan 04 '22 at 01:49
@DonShappelle for the list comprehension you dont actually need the comprehension, try just using a for loop and an counter. Check my edit#20064. And Its a pleasure, I love good problems. – ZXYNINE Jan 04 '22 at 01:53
your edit #20064 fails on sample question 2: Sample test case #2 Wrong Answer N = 6 D = [1, 2, 3, 3, 2, 1] K = 2 Expected Return Value = 4 Returned Value = 5 – Don Shappelle Jan 04 '22 at 02:00
@DonShappelle I forgot to make sure you arent using negative numbers! The slice ends up subtracting k and becoming negative so it wraps around. Do a check beforehand that checks if the subtraction is negative and just setting it to 0 before using it in the left side of the slice. See me edit: "Too many edits later" – ZXYNINE Jan 04 '22 at 02:03
same result as before for question 2. Thanks for playing! :) I'm going to step away and come back to it with a fresh brain. – Don Shappelle Jan 04 '22 at 02:09
@ Thats odd, I think it was a simple if mistake, I actually added in a different version just below that while i think you were running the test and i tested it myself and got the responce 4 as expected and i did get 5 for the previous one so the last version i gave should do the trick. See if that works for you, thats about the best i think i can make it without the ability to test all of your cases myself. Was fun! – ZXYNINE Jan 04 '22 at 02:13
the slice notation on a list create a new list, that is a waste of time and memory, you can use the [index](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) method instead, you can give it extra argument so it only look up in a subsection of the list, but it throw a value error if not found the element – Copperfield Jan 04 '22 at 02:36
@Copperfield ah, thats counter intuitive, thats a very good point, i didnt know about the index method, ill try that out. – ZXYNINE Jan 04 '22 at 02:39
@Copperfield Sadly thats a no go as it will raise a value error if the item is not in that range, and using a try:except would just bog the code down – ZXYNINE Jan 04 '22 at 02:42
@DonShappelle I found where you are running the tests and i spent a while myself trying to solve it. I edited my code above to reflect the fastest function i could create but it still wasnt enough to overcome that last case. Do you actually know if anyone has solved that puzzle before? I dont know if its possible to get much faster than what i did. I even swapped into C# to see if it was a matter of language speed and i got the same result (C# can be up to 4x faster than python). – ZXYNINE Jan 04 '22 at 06:13
@ZXYNINE: with the most recent edits, this post no longer answers the question, and is thus not a fit for SO's Q&A format. You should be able to delete this post. Later, if you come up with a solution, you should be able to edit the post and then flag it for undeletion. – outis Jan 04 '22 at 07:13
Note that the increment & wraparound of `lastMod` can be implemented simply as `lastMod = (lastMod+1) % K`. – outis Jan 04 '22 at 08:02
@outis the mod trick is cool, I didnt really think about using both a mod and a variable to do the wraparound. lastMod was originally made to replace the mod function i was using but with your method basically does the same as my if statement so cool. As for your other comment, Id argue my post does still respond to his original question which was essentially about better methods than a while loop (because he needs the code to run faster) for the problem. I provide the code for the best implementation I could make. and it out performs his at least in the tests I've run. – ZXYNINE Jan 04 '22 at 10:14
@DonShappelle I actually solved it! Take a look at my (hopefully) final edit on this answer smh. – ZXYNINE Jan 04 '22 at 12:07

hc_dev · Answer 2 · 2022-01-04T02:16:26.100

Can we use an appropriate data structure? If so:

Data structures

Seems like an ordered set which you have to shrink to a capacity restriction of K.

To meet that, if exceeds (len(ordered_set) > K) we have to remove the first n items where n = len(ordered_set) - K. Ideally the removal will perform in O(1).

However since removal on a set is in unordered fashion. We first transform it to a list. A list containing unique elements in the order of appearance in their original sequence.

From that ordered list we can then remove the first n elements.

For example: the function lru returns the least-recently-used items for a sequence seq limited by capacity-limit k.

To obtain the length we can simply call len() on that LRU return value:

maximumEatenDishCount = len(lru(seq, k))

Using `set` for uniqueness (up to Python 3.6)

def lru(seq, k):
    return list(set(seq))[:k]

Using `dict` for uniqueness (since Python 3.6)

Same mechanics as above, but using the preserved insertion order of dicts since 3.7:

using OrderedDict explicitly

from collections import OrderedDict

def lru(seq, k):
    return list(OrderedDict.fromkeys(seq).keys())[:k]

using dict factory-method:

def lru(seq, k):
    return list(dict.fromkeys(seq).keys())[:k]

using dict-comprehension:

def lru(seq, k):
    return list({i:0 for i in seq}.keys())[:k]

See also:

I am not very experienced with Python and profiling ... maybe I misunderstood something when simplifying so much. — hc_dev, Jan 04 '22 at 01:36
Python 3.6+ have dicts that maintain insertion order. Use those instead of a set. — dawg, Jan 04 '22 at 01:48

outis · Answer 3 · 2022-01-07T03:45:00.033

As the problem is an exercise, exact solutions are not included. Instead, strategies are described.

There are at least a couple potential approaches:

Use a data structure that supports fast containment testing (a set in use, if not in name) limited to the K most recently eaten dishes. Fortunately, since dict preserves insertion order in newer Python versions and testing key containment is fast, it will fit the bill. dict requires that keys be hashable, but since the problem uses ints to represent dish types, that requirement is met.

With this approach, the algorithm in the question remains unchanged.
Rather than checking whether the next dish type is any of the last K dishes, check whether the last time the next dish was eaten is within K of the current plate count. If it is, skip the dish. If not, eat the dish (update both the record of when the next dish was last eaten and the current dish count). In terms of data structures, the program will need to keep a record of when any given dish type was last eaten (initialized to -K-1 to ensure that the first time a dish type is encountered it will be eaten; defaultdict can be very useful for this).

With this approach, the algorithm is slightly different. The code ends up being slightly shorter, as there's no shortening of the data structure storing information about the dishes as there is in the original algorithm.

There are two takeaways from the latter approach that might be applied when solving other problems:

More broadly, reframing a problem (such as from "the dish is in the last K dishes eaten" to "the dish was last eaten within K dishes of now") can result in a simpler approach.
Less broadly, sometimes it's more efficient to work with a flipped data structure, swapping keys/indices and values.

Approach & takeaway 2 both remind me of a substring search algorithm (the name escapes me) that uses a table of positions in the needle (the string to search for) of where each character first appears (for characters not in the string, the table has the length of the string); when a mismatch occurs, the algorithm uses the table to align the substring with the mismatching character, then starts checking at the start of the substring. It's not the most efficient string search algorithm, but it's simple and more efficient than the naive algorithm. It's similar to but simpler and less efficient than the skip search algorithm, which uses the positions of every occurrence of each character in the needle.

score 0 · Answer 4 · answered Jan 11 '22 at 18:47

from typing import List
# Write any import statements here
from collections import deque, Counter

def getMaximumEatenDishCount(N: int, D: List[int], K: int) -> int:
  # Write your code here
  q = deque()
  cnt = 0
  dish_counter = Counter()
  for d in D:
    if dish_counter[d] == 0:
      cnt += 1
      q.append(d)
      dish_counter[d] += 1
      if len(q) == K + 1:
        remove = q.popleft()
        dish_counter[remove] -= 1

  return cnt

Is there a better way than a while loop to perform this function?

Issue

Code

4 Answers4

Data structures

Using set for uniqueness (up to Python 3.6)

Using dict for uniqueness (since Python 3.6)

Using `set` for uniqueness (up to Python 3.6)

Using `dict` for uniqueness (since Python 3.6)