Given a few strings, how many strings can be lexicographically least by modifying the alphabet?

Question

Number of strings can be huge as in 30000. Given N strings, output which ones can be lexicographically least after modifying the english alphabet. e.g. acdbe......

for example if the strings were:

omm
moo
mom
ommnom

"mom" is already lexicographically least with the original english alphabet. we can make the word "omm" least by switching "m" and "o" in the alphabet ("abcdefghijklonmpqrstuvwxyz"). the other ones you cant make lexicographically First, no matter what you do.

any fast way to do this? I have no ways to approach this except try every single possible alphabet

Every *unique* string can be made "lexicographically least". If two (or more) strings share a common prefix, *only one of them* can be made lexicographically least. Unless the suffix starts with a "new" letter. — wildplasser, Dec 16 '12 at 23:29
Yes I noticed that too. also going with the idea, if a string is a substring of another, the longer one can not be lexi-least — Jackson W, Dec 16 '12 at 23:31
What should your program output actually with the inputs above? — Salih Erikci, Dec 17 '12 at 00:06
{"mom", "omm"}. I'm pretty sure it's solvable using wildplasser's approach. — Jackson W, Dec 17 '12 at 00:16

score 0 · Answer 1 · edited May 23 '17 at 12:04

You want to be able to say weather one of the strings (say s) can be made smaller. You can turn this into a problem for detection of cycles in a graph. Here is how:

Consider the graph consisting of 26 vertices: V = { 'a', 'b', ..., 'z' } Suppose that s is the smallest and consider the letters of s in order, first then second ect.

When you are considering the first letter you know that s[0] should be smaller than x[0] for all other strings x. Now add an edge in the graph from s[0] to x[0] for all x for which they are not the same letter. The edge means "smaller than". Now consider the second letter but only look at those string that had the same first letter as s. Do the same thing - add an edge from s1 to y1 for all y in this reduces set. Then move to the third letter and consider only the strings that had the same first and second letter.. And so on (you can use a Set data structure which first contains all the strings but s and you would remove from it as you go).

Now after you have the graph built you want to know if there is a cycle or not. Indeed, if there is a cycle this means that this s cannot be the smallest, because following the cycle you will have to make the first letter in the cycle smaller than itself, which is impossible. On the other hand if you don't have a cycle, then the graph is a DAG (directed acyclic graph) and any DAG can be topologically sorted, which means its vertices (i.e. the letters of the alphabet) can be ordered in such a way so that any edge goes from smaller to larger. With this order you'll have s be the smallest because of how the graph was constructed.

You can look about detecting cycles in directed graphs - it's very common problem, whose complexity is O(|V|+|E|). In this case |V| = 26 and |E| <= n (because every time when you add an edge in the graph you reduce the set of the strings to check by one).

Thus the complexity is O(n).

If you want to check each of the strings you'll get an overall complexity of O(n^2).

Given a few strings, how many strings can be lexicographically least by modifying the alphabet?

1 Answers1