In (pure) Python (since that's what the question is tagged with), you're likely not going beat that naive/straightforward method of calculating the images and then sorting them. Except for special inputs, it's even going to take only linear time, thanks to Timsort recognizing the up to three monotone parts and just merging them.
Let's benchmark @MBo's example x^3 - 60000x^2 + x + 1, x=-10000..50000
, with x-steps of 0.01, so we have six million points:
Round 1 Round 2 Round 3
1156 ms 1157 ms 1168 ms transform
201 ms 202 ms 203 ms copy+sort
92 ms 94 ms 93 ms copy
So here, even the transformation already took over 10 times as long as the sorting (1160 ms vs 202-93=109 ms, I copy the transformed values before sorting because I test repeatedly).
If you try to identify and merge the monotone parts yourself, that'd replace the 109 ms with something more like the 1160 ms, i.e., be slower.
Code (Try it online!):
from timeit import repeat
def transform(a, b, c, d, X):
return [((a*x + b)*x + c)*x + d
for x in X]
funcs = [
('transform', lambda: transform(a, b, c, d, X)),
('copy+sort', lambda: Y.copy().sort()),
('copy', lambda: Y.copy()),
]
a, b, c, d = 1., -60000., 1., 1.
X = [i / 100 for i in range(-1_000_000, 5_000_001)]
Y = transform(a, b, c, d, X)
number = 1
tss = [[] for _ in funcs]
for _ in range(3):
print(' Round 1 Round 2 Round 3')
for (label, func), ts in zip(funcs, tss):
t = min(repeat(func, number=number)) / number
ts.append(t)
print(*('%5d ms ' % (t * 1e3) for t in ts), label)
print()