Longest common prefix based on elements

Question

I have an array of string elements in an array:

["000", "1110", "01", "001", "110", "11"]

For an element in the array,

I want to find the previous element index with longest common prefix.
If I have multiple matching elements then select the nearest element index.
If none found then just select previous index.

Example:

["000", "1110", "01", "001", "110", "11"]

Output:
[0,1,1,1,2,5]

a) "000" - output is 0, because we do not have any previous elements.
b) "1110" - output is 1, no previous element with longest prefix so select previous index.
c) "01" - output is 1,"000" has longest prefix, so its index is 1.
d) "001" - output is 1, "000" has longest prefix, so its index is 1.
e) "110" - output is 2,  "1110" has longest prefix, so its index 2.
f) "11" - output 5, "110" is most nearest element with longest prefix so its index 5.

I am not able to understand what approach I need to take for this task. Can you please help me.

Is it correct that in your example you start indexing from 1 and not 0? Why is your `commonPrefix` returning a String and not a `int[]` or `List` as you have shown in your example? — Hawk, Aug 25 '20 at 21:13
yes starting index is 1. I am starting with existing code I have in geekforgeeks, but that is not the correct approach to follow, so got stuck where to start. — learner, Aug 25 '20 at 21:15
What you are trying to do in each method is different than what the method comment and signature says. I presume these were already prepared for you, do you understand those comments? — Hawk, Aug 25 '20 at 21:19
@Hawk, I have added another code now to my post, but its time complexity is more, how to improve this now. — learner, Aug 25 '20 at 21:40

Hawk · Answer 1 · 2020-08-26T08:30:04.093

Naive solution based on the former question

commonPrefix is supposed to (according to the comment) the longest prefix in the array up to index n. What does that mean? You need to calculate all prefixes and pick the longest.

static String commonPrefix(String arr[], int n) {
    String longestPrefix = "";
    for (int i = 0; i < n; i++) {
        final String currentPrefix = commonPrefixUtil(arr[i], arr[n]);
        if (currentPrefix.length() > longestPrefix.length()) {
            longestPrefix = currentPrefix;
        }
    }
    return longestPrefix;
}

So that would yield "00" for arr = ["000", "1110", "01", "001", "110", "11"]; n = 3.

Now that we have the longest prefix, what? We need to find the closest index to n that starts with that prefix...

static int closestIndex(String[] arr, String longestPrefix, int n) {
    for (int i = n - 1; i >= 0; i--) {
        if (arr[i].startsWith(longestPrefix)) {
            return i + 1; // + 1 because the solution wants starting index with 1
        }
    }
    return 0;
}

How to put it together? Just call the two methods for each input

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };
    int[] output = new int[words.length];

    for (int i = 0; i < words.length; i++) {
        final String longestPrefix = commonPrefix(words, i);
        output[i] = closestIndex(words, longestPrefix, i);
    }

    System.out.println(Arrays.toString(output));
}

You have deleted your commonPrefixUtil implementation that worked from the question, so I add my own:

static String commonPrefixUtil(String str1, String str2) {
    int shorterStringLength = Math.min(str1.length(), str2.length());
    int length = 0;
    for (; length < shorterStringLength; length++) {
        if (str1.charAt(length) != str2.charAt(length)) {
            break;
        }
    }
    return str1.substring(0, length);
}

Optimalized solution

I created a new solution using dynamic programming with tabulation (if I understand correctly), that is I use a hashmap of already all prefixes pointing to the indexes of the words where the prefix is from. The value of the Map is a sorted tree, so it can be really easily determined which word with the common prefix is closest to the current index. The HashMap ensures constant-time operations and the TreeSet guarantees log(n) time cost.

Explained more simply, I process the first letter of all words, and then the next etc. During this process I memorise where all prefix substrings are, while they are automatically sorted. I stop the loop after processing the last letter of the longest word.

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };

    int[] result = new int[words.length];
    final int firstWordLength = words.length > 0 ? words[0].length() : 8;
    // prefix -> [indexes of prefix occurrence]
    HashMap<String, TreeSet<Integer>> prefixes = new HashMap<>(words.length * (firstWordLength + 1) * 2);
    int wordLength = 1;
    boolean isUpdatedResult;
    do { // O(k)
        isUpdatedResult = false;
        for (int wordIdx = 0; wordIdx < words.length; wordIdx++) { // O(n)
            if (words[wordIdx].length() < wordLength) {
                continue;
            }
            final String currentPrefix = words[wordIdx].substring(0, wordLength); // Java >= 7 update 6 ? O(k) : O(1)
            final TreeSet<Integer> prefixIndexes = prefixes.get(currentPrefix); // O(1)
            if (prefixIndexes != null) {
                // floor instead of lower, because we have put `wordIdx + 1` inside
                final Integer closestPrefixIndex = prefixIndexes.floor(wordIdx); // O(log n)
                if (closestPrefixIndex != null) {
                    result[wordIdx] = closestPrefixIndex;
                    isUpdatedResult = true;
                }
            }
            // take the previous index for the result if no match
            if (result[wordIdx] == 0) {
                result[wordIdx] = wordIdx;
            }
            final TreeSet<Integer> newPrefixIndexes = prefixes.computeIfAbsent(currentPrefix, p -> new TreeSet<>()); // O(1)
            // the result index must be + 1
            newPrefixIndexes.add(wordIdx + 1); // O(log n)
        }
        wordLength++;
    } while (isUpdatedResult);

    System.out.println(Arrays.toString(result));
}

I have marked all the operations with their big O time complexity. n is the number of words in the input array, i.e. words.length and k is the length of the longest word. According to Jon Skeet's post the time complexity of substring has changed in Java 7 update 6 to linear.

So we can calculate:

O(k) * O(n) * (O(log n) + O(k))

Hopefully, the code is understandable and I calculated the complexity correctly.

Thanks, but we have almost 4 loops, can the time complexity be reduced? — learner, Aug 25 '20 at 22:14
I have created a more optimalized solution, please check it out :) — Hawk, Aug 26 '20 at 02:37
You should use `prefixes.computeIfAbsent(currentPrefix, k -> new TreeSet<>())` insted of `prefixes.getOrDefault(currentPrefix, new TreeSet<>())` — , Aug 26 '20 at 03:34

David Eisenstat · Answer 2 · 2020-08-26T00:20:01.253

Using a trie makes it pretty easy to find the longest common prefix so far in time linear in the total number of characters in the input. In Python (sorry):

import collections


class Trie:
    def __init__(self):
        self._children = collections.defaultdict(Trie)
        self._previous_index = 0

    # Find the longest prefix of word that appears in the trie,
    # return the value of _previous_index at that node.
    def previous_index(self, word):
        node = self
        for letter in word:
            child = node._children.get(letter)
            if child is None:
                break
            node = child
        return node._previous_index

    # Ensure that each of the prefixes of word exists in the trie.
    # At each node corresponding to a prefix, set its _previous_index to index.
    def insert(self, index, word):
        self._previous_index = index
        node = self
        for letter in word:
            node = node._children[letter]
            node._previous_index = index


def longest_common_prefix_indexes(words):
    trie = Trie()
    for index_minus_one, word in enumerate(words):
        print(trie.previous_index(word))
        trie.insert(index_minus_one + 1, word)


longest_common_prefix_indexes(["000", "1110", "01", "001", "110", "11"])

can you please add comments so I will try to do this in Java — learner, Aug 26 '20 at 00:14
Can you please check your claim about the linear complexity? I think the code you have shown is O(n*k) (insert for each word, for each letter). And I don't know what is the complexity of get (from a quick search I see O(key_length) but really not sure). — Hawk, Aug 26 '20 at 02:51
@Hawk linear in the number of letters, not the number of words. — David Eisenstat, Aug 26 '20 at 02:58
@Hawk get is constant-time (for sure if we assume an O(1)-size alphabet, but in a randomized sense for larger alphabets). — David Eisenstat, Aug 26 '20 at 02:59

score 0 · Answer 3 · answered Aug 25 '20 at 21:47

I think this might work for you

This is funcion that returns the length of the longest common prefix of two Strings:

int commonPrefixLength(String s, String t) {
    if (s.length() == 0 || t.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;
    int shorter = Math.min(s.length(), t.length());

    for (int i = 0; i < shorter; i++) {
        if (s.charAt(i) != t.charAt(i)) {
            break;
        }
        commonPrefixLength++;
    }

    return commonPrefixLength;
}

For every string in array you can check previous strings and choose the one with to longest common prefix:

    //indexing from 1
    String[] strings = new String[] {"", "000", "1110", "01", "001", "110", "11"};

    for (int i = 1; i < strings.length; i++) {
        int longestCommonPrefix = 0;
        int answer = 0;

        //for every previous string
        for (int j = i - 1; j > 0; j--) {
            int commonPrefix = commonPrefixLength(strings[j], strings[i]);
            if (commonPrefix > longestCommonPrefix) {
                longestCommonPrefix = commonPrefix;
                answer = j;
            }
        }

        //no common prefix found
        if (answer == 0) {
            answer = i - 1;
        }

        System.out.println(answer);
    }

Longest common prefix based on elements

3 Answers3

Naive solution based on the former question

Optimalized solution