5

In Python 3, I have a list of strings and would find it useful to be able to append a sentinel that would compare greater than all elements in the list.

Is there a straightforward way to construct such an object?

I can define a class, possibly subclassing str, but it feels like there ought to be a simpler way.

For this to be useful in simplifying my algorithm, I need to do this ahead of time, before I know what the strings contained in the list are going to be (and so it can't be a function of those strings).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Do you need a specific implementation of `str.__gt__()` or you want to construct a `string` that results to greater than any other `string`? – pissall Nov 08 '19 at 04:08
  • You don't need to subclass `str`, and object would do, that implements the comparison operators accordingly. – juanpa.arrivillaga Nov 08 '19 at 04:12

3 Answers3

4

This is kind of a naïve answer, but when you're dealing with numbers and need a sentinel value for comparison purposes, it's not uncommon to use the largest (or smallest) number that a specific number type can hold.

Python strings are compared lexicographically, so to create a "max string", you'd simply need to create a long string of the "max char":

MAX_CHAR = chr(sys.maxunicode)

# One million is entirely arbitrary here.
# It should ideally be 1 + the length of the longest possible string that you'll compare against 
MAX_STRING = MAX_CHAR * int(1e6)

Unless there's weird corner cases that I'm not aware of, MAX_STRING should now be considered greater than any other string (other than itself); provided that it's long enough.

wim
  • 338,267
  • 99
  • 616
  • 750
Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • Why bother making it long? If that's correct at all, which it probably is because of the probably not otherwise used character, then length 1 suffices. – Kelly Bundy Feb 26 '23 at 15:34
  • 2
    @KellyBundy I made it long for the odd chance that they had a string starting with `\U0010ffff` characters. – Carcigenicate Feb 26 '23 at 15:40
2

You can inline objects of a custom class:

from functools import total_ordering

M = total_ordering(type('maxstring', (), {'__lt__': lambda *_: False}))()

print(sorted([M, 'abc', M, 'zyx']))
# ['abc', 'zyx', <__main__.maxstring object>, <__main__.maxstring object>]

Unfortunately, if you're going to inline this object more than once, and want to make sure that multiple maxstring objects compare equal, you should fix its comparison == operator:

M1 = total_ordering(type('maxstring', (), {'__lt__': lambda *_: False}))()
M2 = total_ordering(type('maxstring', (), {'__lt__': lambda *_: False}))()

print(M1 == M1)
# True
print(M1 == M2)
# False

M1 = total_ordering(type('maxstring', (), {'__lt__': lambda *_: False, '__eq__': lambda s,o: type(o).__name__=='maxstring'}))()
M2 = total_ordering(type('maxstring', (), {'__lt__': lambda *_: False, '__eq__': lambda s,o: type(o).__name__=='maxstring'}))()

print(M1 == M2)
# True

Important note: Do not make your maxstring an instance of str, otherwise it will be treated half like a maxstring and half like an empty string, and this will mess up the comparisons:

M = total_ordering(type('maxstring', (str,), {'__lt__': lambda *_: False}))()

print(isinstance(M, str))
# True
print(sorted([M, 'abc', M, 'zyx']))
# ['', 'abc', '', 'zyx']

print((M < 'abc'), (M > 'abc'), ('abc' < M), ('abc' > M))
# False False False False
Stef
  • 13,242
  • 2
  • 17
  • 28
2

For fun, here's an hack that depends on implementation details. I build a character/string that's not possible without such a hack:

import ctypes

# From https://stackoverflow.com/a/62188431
def deref(addr, typ):
    return ctypes.cast(addr, ctypes.POINTER(typ))

# Largest allowed character
S = chr(0x10FFFF)

# Make it larger
deref(id(S), ctypes.c_int)[18] += 1

Let's take it for a spin:

import random

# Show that it's indeed 0x110000
print(hex(ord(S)))

# Test that it's larger than all others
chars = [S, *map(chr, range(0x110000))]
for _ in range(3):
    random.shuffle(chars)
    print(max(chars) is S)

# Show that it's normally impossible
chr(ord(S))

Output (try it):

0x110000
True
True
True
Traceback (most recent call last):
  File "/ATO/code", line 18, in <module>
    chr(ord(S))
ValueError: chr() arg not in range(0x110000)
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65