Analysis / Idea
Your median(expenditure[i:d+i])
is the culprit, as that takes O(d log d) time for sorting the entire unsorted slice of size d every time. You can reduce it to O(log d) by keeping the current window of trailing elements for example in a SortedList. You get the median from the middle element or two, and to update, just add one new element and remove the oldest element.
Implementation
from sortedcontainers import SortedList
n = 9
d = 5
expenditure = [2, 3, 4, 2, 3, 6, 8, 4, 5]
count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for i in range(d, n):
median = (trailing[half] + trailing[~half]) / 2
if expenditure[i] >= 2 * median:
count += 1
trailing.add(expenditure[i])
trailing.remove(expenditure[i - d])
print(count)
We could leave out the / 2
and the 2 *
, but then "median" would be the wrong name, and naming things is hard. We could do if expenditure[i] >= trailing[half] + trailing[~half]
, but I find it less clear.
Output
If you add
print(f'{trailing=} {median=} {expenditure[i]=}')
after the median = ...
line, you can see what's going on:
trailing=SortedList([2, 2, 3, 3, 4]) median=3.0 expenditure[i]=6
trailing=SortedList([2, 3, 3, 4, 6]) median=3.0 expenditure[i]=8
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=4
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=5
2
Alternative implementation
Using zip
instead of indexes:
count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
median = (trailing[half] + trailing[~half]) / 2
if today >= 2 * median:
count += 1
trailing.add(today)
trailing.remove(oldest)
print(count)
Alternative data structure: sorted regular list
I found the problem at HackerRank, which doesn't have sortedcontainers
. But the following gets accepted there.
We can use a regular Python list
but keep it sorted ourselves with the help of sorted
and bisect
included in Python's standard library:
from bisect import bisect_left, insort
count = 0
trailing = sorted(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
median = (trailing[half] + trailing[~half]) / 2
if today >= 2 * median:
count += 1
insort(trailing, today)
del trailing[bisect_left(trailing, oldest)]
print(count)
This takes O(1) time for accessing the middle element(s), O(log d) for finding the insertion/deletion indexes, and O(d) time for the actual insertion/deletion (since it needs to shift all elements on the right of the index). But that O(d) shifting is very fast very low level.
Two more: sorted bytearray and counting sort
The question originally didn't include the reference to HackerRank. Now that I see that the values are limited to integers from 0 to 200, we can also use a bytearray
:
trailing = bytearray(sorted(expenditure[:d]))
And as I just saw pointed out in the discussions there, for this range of allowed values we can also use a form of counting sort. I think a Fenwick tree would make this particularly fast, I might try that later.
Benchmark
In the comments you mentioned n=200000 and d=10122 as a large case. So I tested with this data:
n = 200000
d = 10122
expenditure = random.choices(range(201), k=n)
Benchmark of my solutions:
at replit.com on my weak laptop
SortedList + indexing ~1.8 seconds ~6.4 seconds
SortedList + zipping ~1.8 seconds ~6.4 seconds
sorted regular list ~0.6 seconds ~8.8 seconds
sorted bytearray ~0.3 seconds ~1.7 seconds
Not sure why the regular list solution was so relatively slow on my laptop. I suspect it exceeds my CPU's level 1 caches.