How many minimum numbers of characters from a given string S, should delete to make it a sorted string

Question

I need to find the minimum number of deletions required to make string sorted.

Sample Test case:

# Given Input:
teststr = "abcb"
# Expected output:
1

# Explanation
# In this test case, if I delete last 'b' from "abcb", 
# then the remaining string "abc" is sorted. 
# That is, a single deletion is required.

# Given Input:
teststr = "vwzyx"
# Expected output:
2

# Explanation
# Here, if I delete 'z' and 'x' from "vwzyx", 
# then the remaining string "vwy" is a sorted string.

I tried the following but it gives time limit exceeded error. Any other approach to this problem?

    string = input()
    prev_ord = ord(string[0])
    deletion = 0
    for char in string[1:]:
        if ord(char) > prev_ord +1 or ord(char) < prev_ord:
            deletion += 1
            continue
        prev_ord = ord(char)
    print(deletion)

There's a pretty good Python implementation of this problem, which is known as the [Longest increasing subsequence](https://en.wikipedia.org/wiki/Longest_increasing_subsequence) problem up (here)[http://stackoverflow.com/questions/3992697/longest-increasing-subsequence] — chthonicdaemon, Oct 01 '16 at 15:17

PM 2Ring · Accepted Answer · 2019-05-27T08:23:58.270

Your current algorithm will give incorrect results for many strings.

I suspect that there's a more efficient way to solve this problem, but here's a brute-force solution. It generates subsets of the input string, ordered by length, descending. The elements in the subsets retain the order from the original string. As soon as count_deletions finds an ordered subset it returns it (converted back into a string), as well as the number of deletions. Thus the solution it finds is guaranteed to be no shorter than any other sorted selection of the input string.

Please see the itertools docs for info about the various itertools functions I've used; the algorithm for generating subsets was derived from the powerset example in the Recipes section.

from itertools import chain, combinations

def count_deletions(s):
    for t in chain.from_iterable(combinations(s, r) for r in range(len(s), 0, -1)):
        t = list(t)
        if t == sorted(t):
            return ''.join(t), len(s) - len(t)

# Some test data. 
data = [
    "abcdefg",
    "cba",
    "abcb",
    "vwzyx",
    "zvwzyx",
    "adabcef",
    "fantastic",
]

for s in data:
    print(s, count_deletions(s))

output

abcdefg ('abcdefg', 0)
cba ('c', 2)
abcb ('abc', 1)
vwzyx ('vwz', 2)
zvwzyx ('vwz', 3)
adabcef ('aabcef', 1)
fantastic ('fntt', 5)

That data set is not really adequate to fully test algorithms designed to solve this problem, but I guess it's an ok starting point. :)

Update

Here's a Python 3 implementation of the algorithm mentioned by Salvador Dali on the linked page. It's much faster than my previous brute-force approach, especially for longer strings.

We can find the longest sorted subsequence by sorting a copy of the string and then finding the Longest Common Subsequence (LCS) of the original string & the sorted string. Salvador's version removes duplicate elements from the sorted string because he wants the result to be strictly increasing, but we don't need that here.

This code only returns the number of deletions required, but it's easy enough to modify it to return the actual sorted string.

To make this recursive function more efficient it uses the lru_cache decorator from functools.

from functools import lru_cache

@lru_cache(maxsize=None)
def lcs_len(x, y):
    if not x or not y:
        return 0

    xhead, xtail = x[0], x[1:]
    yhead, ytail = y[0], y[1:]
    if xhead == yhead:
        return 1 + lcs_len(xtail, ytail)
    return max(lcs_len(x, ytail), lcs_len(xtail, y))

def count_deletions(s):
    lcs_len.cache_clear()
    return len(s) - lcs_len(s, ''.join(sorted(s)))

data = [
    "abcdefg",
    "cba",
    "abcb",
    "vwzyx",
    "zvwzyx",
    "adabcef",
    "fantastic",
]

for s in data:
    print(s, count_deletions(s))

output

abcdefg 0
cba 2
abcb 1
vwzyx 2
zvwzyx 3
adabcef 1
fantastic 5

@sagar Sorry, I don't know Java. But if you can read Python it shouldn't be too hard to translate my code into another language. I suggest you give it a try, and if you get stuck, post your code in a new question, maybe linking back to this one, or to the duplicate target shown at the top of this page. — PM 2Ring, Jun 02 '18 at 17:05
actually `@lru_cache` seems to be python specific lib or utility which I can't afford in java and also I need to only use core library. — Sagar Kharab, Jun 02 '18 at 17:09
@sagar There's some info about doing memoization in Java here https://stackoverflow.com/questions/3623754/what-are-the-different-techniques-for-memoization-in-java — PM 2Ring, Jun 02 '18 at 17:28
I fixed this with LCS in java. There was an algo to that modified it accordingly — Sagar Kharab, Jun 03 '18 at 05:25

score 0 · Answer 2 · answered Oct 01 '16 at 13:51

0

Hope it works for all cases :)

s = input()
s_2 = ''.join(sorted(set(s), key=s.index))
sorted_string = sorted(s_2)
str_to_list = list(s_2)
dif = 0

for i in range(len(sorted_string)):
    if sorted_string[i]!=str_to_list[i]:
        dif+=1

print(dif+abs(len(s)-len(s_2)))

answered Oct 01 '16 at 13:51

Tuc3k

1,032
9
12

1

No, that doesn't work. Eg, on 'zvwzyx' it returns 5, but we can produce 'vwz' from that string, so the deletion count is only 3. And on 'adabcef' it should return 1, not 4. – PM 2Ring Oct 01 '16 at 14:31
Yep, you're right – Tuc3k Oct 01 '16 at 14:45

How many minimum numbers of characters from a given string S, should delete to make it a sorted string

2 Answers2