0

I have created below logic to find if the combination of 2 strings has all the digits from 0-9 at least once. But I think this is very naive and need improvements in performance. Can you please suggest better solution and anything wrong with my solution. Thanks.

Input: Array of strings with digits(Ex: 012345,6789,34567). And I am trying to find how many pairs of strings will have all the digits 0-9 at least once.(In Ex: 1 pair -1st and 2nd).

static long getNumberOfValidPairs(String[] tickets) {
        long count=0;
        for(int i=0;i<tickets.length-1;i++){
           for(int j=i+1;j<tickets.length;j++){
               String concat = tickets[i]+tickets[j];
               if(concat.length() <10){
                   continue;
               }
               if(concat.contains("0") && concat.contains("1") && concat.contains("2") && concat.contains("3") && concat.contains("4") && concat.contains("5") && concat.contains("6") && concat.contains("7") && concat.contains("8") && concat.contains("9")){
                   count++;
               }
           }
       }
        return count;
    }

Improved solution:

static long getNumberOfValidPairs(String[] tickets) {
        long count=0;
        short[] masks = new short[tickets.length];
        char[] chs = null;
        short mask = 0;
        short mask_full = (short) 0b1111111111;
        for(int i=0;i<tickets.length;i++){
            chs = tickets[i].toCharArray();
            mask = 0;
            for(char ch:chs){
                if (ch >= '0' && ch <= '9') {
                int digit = ch - '0';
                mask |= (1 << digit);
            }

            }
            masks[i] = mask;
        }

        for(int i=0;i<tickets.length-1;i++){
            short mask_i = masks[i];
           for(int j=i+1;j<tickets.length;j++){
               short mask_j = masks[j];
               short mask_i_j_concatenated = (short) (mask_i | mask_j);
            if (mask_i_j_concatenated == mask_full) {
               // System.out.println("Strings [" + string_i + "] and [" + string_j + "] form a pair.");
                count++;
            }
           }
       }
        return count;
    }
A MJ
  • 115
  • 1
  • 9

5 Answers5

3

I think that the time complexity here is O(n^2) since you will need to try all pairs. So two for cycles as you did it are OK.

So basically the only thing you could improve is checking if a two strings form a pair. At the moment you're doing it by concatenating and then searching for each of the 0-9 digits. This is not quite optimal as you create unnecessary string and also search for each of the digits, basically scanning the string 10 times.

What you could do instead is creating a bitmask for each of the strings, where the bit at position i shows if i is present in the string or not. Then you can check if concatenation contains all the digits by simple OR-ing two bit masks and checking if the result is 2^10-1 i.e. 1023. Since you only need to calculate bit masks once and | operation is fast, this will be better than concatenating and scanning for digits.

Some code. Assume we have a list of strings as follows:

    List<String> strings = Arrays.asList("012345","6789","34567");

This is how you create bit masks:

    short[] masks = new short[strings.size()];
    for (int i = 0; i < strings.size(); i++) {
        String str = strings.get(i);
        char[] chs = str.toCharArray();
        short mask = 0;
        for (int index = 0; index < chs.length; index++) {
            char ch = chs[index];
            if (ch >= '0' && ch <= '9') {
                int digit = ch - '0';
                mask |= (1 << digit);
            }
        }
        masks[i] = mask;
    }

This is how you check for pairs:

    short mask_full = (short) 0b1111111111;

    for (int i = 0; i < strings.size() - 1; i++) {
        String string_i = strings.get(i);
        short mask_i = masks[i];

        for (int j = i; j < strings.size(); j++) {
            String string_j = strings.get(j);
            short mask_j = masks[j];

            short mask_i_j_concatenated = (short) (mask_i | mask_j);
            if (mask_i_j_concatenated == mask_full) {
                System.out.println("Strings [" + string_i + "] and [" + string_j + "] form a pair.");
            }
        }
    }

I've only sketched the code without much verification, so be careful.

lexicore
  • 42,748
  • 17
  • 132
  • 221
  • It is 2^11-1, not 2^10-1 – BKE Apr 14 '18 at 10:18
  • @BKE `0b1111111111` = `1023` = `2^10-1` or do I miss something? – lexicore Apr 14 '18 at 10:22
  • Wah! Though I took some time to understand masking concept, impressed by your answer . Exactly the kind of answer I was looking for . Thanks for your inputs, also a new learning for me today :-). I will try this concept and accept your answer. – A MJ Apr 14 '18 at 17:32
  • By the way just one more question, can we still improve the solution using may be java streams somewhere? – A MJ Apr 14 '18 at 18:38
  • @AMJ Parallel streams *might* improve performance, but it is really hard to say if this is a case without doing a [benchmark](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). Which is pretty hard to do correctly. – lexicore Apr 14 '18 at 20:09
3

This can indeed be solved faster than O (input_length^2), where input_length is the total length of all the given strings.

Here is a solution in O (input_length + 2^{digits * 2}), where digits is 10, the number of different digits. So, the term 2^{digits * 2} is essentially a constant which does not depend on the size of the input.

First, for each string, calculate the corresponding mask: an integer from 0 to 1023 (which is 2^{10} - 1) where bit i is set if the string contains digit i. For example, the string 12153 has mask 0000101110 in binary, which is 2^5 + 2^3 + 2^2 + 2^1 = 46 in decimal. This can be done on O (input_length). After that, we won't need the actual input strings anymore, and won't even need the individual masks themselves. What we are interested is the count of each mask from 0 to 1023.

Now, let the number of strings with mask m be f[m]. The answer can now be found as follows:

answer = f[1023] * (f[1023] - 1) / 2
for u = 0, 1, 2, ..., 1022:
    for v = u+1, u+2, ..., 1023:
        if u | v == 1023:
            answer += f[u] * f[v]

Indeed, the f[1023] strings which individually contain all the digits can be paired arbitrarily. If there are say 5 such strings, there are choose (5, 2) = 5 * (5 - 1) / 2 = 10 ways to make a pair out of them.

Now to the general case. Consider a string with mask u and a string with mask v, and u < v. They form a pair if bitwise OR of u and v is 1023, that is, has all bits from 0 to 9 set. So, if u | v = 1023, and there are f[u] strings with mask u and f[v] strings with mask v, there are f[u] * f[v] such pairs contributed by these two masks.


This solution can be optimized further, from O (input_length + 2^{digits * 2}) to O (input_length + 2^{digits} * digits), by first calculating g[v] as the sum of f[w] for all supersets w of v using dynamic programming.

Gassa
  • 8,546
  • 3
  • 29
  • 49
  • Thanks for your inputs. So, if my understanding is correct, we can use the solution provided in first part if the input size is more than 1023. And if the input size is less than that, may be we can just compare masks of each individual input pair(check 'improved solution' part in the question). Correct me if I am wrong. Then, 2nd thing can you please elaborate on your 'optimized further' solution(sorry i cudnt understand). – A MJ Apr 14 '18 at 19:16
  • 1
    Very cool. I had the idea with masks, to, but did not manage to make it subquadratic. `O (input_length + 2^{digits * 2})` totally makes sens for large `input_length`. – lexicore Apr 14 '18 at 20:40
  • @AMJ Yes, this answer's code is faster than the improved solution once you have more than ~2^{10} input strings, or if your strings are longer than ~2^{20} digits in total (these are of course only order estimates, since the constant factor may be different in different implementations). – Gassa Apr 16 '18 at 22:11
  • @AMJ As for the faster solution's explanation, I gave it a try, but it turns into a generic writeup about DP with bitmasks, and gets a few times longer than the current answer, so I don't see how it fits here. If you are interested, I'd suggest reading about [dynamic programming with bitmasks](https://www.google.com/search?q=dynamic+programming+with+bitmask) in general, and if you still have a question after reading, perhaps it's better to ask that concrete question separately. – Gassa Apr 16 '18 at 22:16
1

you can use StringBuilder for better performance on Strings and allocating memory (might be minor effect in this case)

StringBuilder sb = new StringBuilder();
sb.append(tickets[i]).append(tickets[i]);

you can transform to string using sb.toString() and do all string operation with it

Note: don't create StringBuilder instance every time, use delete to clear the array

another approach you can do is using a Set to check all the numbers of 0 to 9 are present, simply by checking length

    Set<Character> set = new HashSet<Character>();
    for(Character c : "8654231097777".toCharArray()){
        set.add(c);
    }
    System.out.println(set.size());

by this you doing only one pass on the string in contrast to multiple passes (each contains call)

but it still remains O(1): instead of 10 loops you will do 1...

shahaf
  • 4,750
  • 2
  • 29
  • 32
  • Okay, thanks for the suggestion. Let me try that once. Though I agree it will be helpful, but not very sure it will make very big difference. Just trying to explore if there is any other way of solving the problem improving the complexity(2 big loops and so many 'contains' calls:( ).. – A MJ Apr 14 '18 at 09:48
  • 1
    @AMJ try the above, using `Set` to reduce the `contains` calls – shahaf Apr 14 '18 at 09:55
  • @AMJ Infact it makes a lot of difference in your case if your array is going to be huge, you will get PermGen issues. – wandermonk Apr 14 '18 at 09:56
  • 10 loops are also `O(1)`. – lexicore Apr 14 '18 at 10:10
  • @shahaf Ah, yes, OK. I've interpreted it another way round. Sorry. – lexicore Apr 14 '18 at 10:12
  • I tried this and did not feel too much different. I will try the one shared from @lexicore , looks good. Thanks for your inputs. – A MJ Apr 14 '18 at 17:56
1

First, you shouldn't try to optimize if there's no need to. Unless your array is very large, or you're doing that a big number of types, there are few chances that this constitutes a performance problem.

Here are some ideas anyway about what is slow in your solution, and what could make it faster:

  • you're concatenating strings, only to count characters in both strings. This is unnecessary.
  • you could take shortcuts if you know that one of the strings has all the characters: all the pairs it's involved in can be added to the count without checking the other side.
  • you call contains() 10 times on each string, for ech pair. And each contains() needs to traverse the string until it finds the searched substring. And contains() works with substrings, not individual characters.

Here's how it could most probably be made much faster easily:

  1. Traverse the array, and create another array containing a BitSet (or, much more efficient, a short) for each of the string. The BitSet (or the short) would contain 10 bits (one for each digit), which would be true if the string contains that number.
  2. Use your algorithm, but replace the inner check by a check that bitset1 or bitset2 has a cardinality of 10.

For the first part, transforming the String into a BitSet (or short) should simply be done by iterating through the characters of the string, and setting the bit bit corresponding to the digit to true.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • I won't use `BitSet`s for 10 bits. A `short` would work. – lexicore Apr 14 '18 at 10:07
  • @lexicore indeed, using short significantly improves the performance (at the price of slightly more complex bit manipulation). With BitSet, the performance is x3. With short, it's x100. – JB Nizet Apr 14 '18 at 10:21
  • Thanks for your inputs. I have accepted the answer from lexicore, as it has more details and code. – A MJ Apr 14 '18 at 18:44
0

Preparation for optimization

Firstly, we will refactor inner loop

for (int j = i + 1; j < tickets.length; j++) {
    String pair = tickets[i] + tickets[j];

    if (pair.length() < 10) {
        continue;
    }

    if (containsAllDigit(pair)) {
        count++;
    }
}

So, we've just created new function containsAllDigit

private static boolean containsAllDigit(String pair) {
    return pair.contains("0") 
            && pair.contains("1") && pair.contains("2") 
            && pair.contains("3") && pair.contains("4") 
            && pair.contains("5") && pair.contains("6") 
            && pair.contains("7") && pair.contains("8")
            && pair.contains("9");
}

Now, let's containsAllDigit method also to refactoring

private static boolean containsAllDigit(String pair) {
    String[] digits = 
            new String[] { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" };

    for (String digit : digits) {
        if (!pair.contains(digit)) {
            return false;
        }
    }
    return true;
}

Basically, this check is equal to the previous check. && is short circuit operator and the first time when the method contains returns false it will stop with evaluation of the boolean expression.

Finally, we will pass the list of digits as an argument. So, the method containsAllDigit will be private static boolean containsAllDigit(String pair, String[] digits)


Optimization

Let's analyze the structure of each pair. Each pair contains left and right side. If the left side contains all digits then there is no reason to check the right side.
For example, this array {"123456789", "1", "2", "3"}. Here is the answer 3.

Now, let's remove digit 1 from the first element of the array. Now the answer for "complete" pairs is 1. And when we check pairs, we only check does the right side contains digit 1.

If we generalize this approach, we only need to check the right part of the pair does it contain missing digits.
Now we create a method to find missing digits, call it findMissingDigits. As parameters, we pass the left part of a pair, and the second argument is the list of digits - String[] digits.

private static String[] findMissingDigits(String left, String[] digits) {
    List<String> ret = new ArrayList<>();
    for (String digit : digits) {
        if (!left.contains(digit)) {
            ret.add(digit);
        }
    }

    return ret.toArray(new String[0]);
}

And the last step is to change inner loop. Method containsAllDigit now get the list of missing digits instead of the list of all digits.

for (int i = 0; i < tickets.length - 1; i++) {
    String left = tickets[i];

    String[] missingDigits = findMissingDigits(left, digits);
    for (int j = i + 1; j < tickets.length; j++) {
        String pair = left + tickets[j];

        if (pair.length() < 10) {
            continue;
        }

        if (containsAllDigit(pair, missingDigits)) {
            count++;
        }
    }
}
djm.im
  • 3,295
  • 4
  • 30
  • 45
  • 1
    It is most probably an interview/quiz/homework task, so perfromance optimization may very well be the primary goal of the excercise. – lexicore Apr 14 '18 at 20:42