Your current algorithm will give incorrect results for many strings.
I suspect that there's a more efficient way to solve this problem, but here's a brute-force solution. It generates subsets of the input string, ordered by length, descending. The elements in the subsets retain the order from the original string. As soon as count_deletions
finds an ordered subset it returns it (converted back into a string), as well as the number of deletions. Thus the solution it finds is guaranteed to be no shorter than any other sorted selection of the input string.
Please see the itertools
docs for info about the various itertools
functions I've used; the algorithm for generating subsets was derived from the powerset
example in the Recipes section.
from itertools import chain, combinations
def count_deletions(s):
for t in chain.from_iterable(combinations(s, r) for r in range(len(s), 0, -1)):
t = list(t)
if t == sorted(t):
return ''.join(t), len(s) - len(t)
# Some test data.
data = [
"abcdefg",
"cba",
"abcb",
"vwzyx",
"zvwzyx",
"adabcef",
"fantastic",
]
for s in data:
print(s, count_deletions(s))
output
abcdefg ('abcdefg', 0)
cba ('c', 2)
abcb ('abc', 1)
vwzyx ('vwz', 2)
zvwzyx ('vwz', 3)
adabcef ('aabcef', 1)
fantastic ('fntt', 5)
That data set is not really adequate to fully test algorithms designed to solve this problem, but I guess it's an ok starting point. :)
Update
Here's a Python 3 implementation of the algorithm mentioned by Salvador Dali on the linked page. It's much faster than my previous brute-force approach, especially for longer strings.
We can find the longest sorted subsequence by sorting a copy of the string and then finding the Longest Common Subsequence (LCS) of the original string & the sorted string. Salvador's version removes duplicate elements from the sorted string because he wants the result to be strictly increasing, but we don't need that here.
This code only returns the number of deletions required, but it's easy enough to modify it to return the actual sorted string.
To make this recursive function more efficient it uses the lru_cache
decorator from functools.
from functools import lru_cache
@lru_cache(maxsize=None)
def lcs_len(x, y):
if not x or not y:
return 0
xhead, xtail = x[0], x[1:]
yhead, ytail = y[0], y[1:]
if xhead == yhead:
return 1 + lcs_len(xtail, ytail)
return max(lcs_len(x, ytail), lcs_len(xtail, y))
def count_deletions(s):
lcs_len.cache_clear()
return len(s) - lcs_len(s, ''.join(sorted(s)))
data = [
"abcdefg",
"cba",
"abcb",
"vwzyx",
"zvwzyx",
"adabcef",
"fantastic",
]
for s in data:
print(s, count_deletions(s))
output
abcdefg 0
cba 2
abcb 1
vwzyx 2
zvwzyx 3
adabcef 1
fantastic 5