4

I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:

private ArrayList<String> subsets(String word){
        ArrayList<String> s = new ArrayList<String>();
        int length = word.length();
        for (int c=0; c<length; c++){
            for (int i=0; i<length-c; i++){
                String sub = word.substring(c, c+i+1);
                System.out.println(sub);
                //if (!s.contains(sub) && sub!=null) 
                    s.add(sub);
            }
        }
        //java.util.Collections.sort(s, new MyComparator());
        //System.out.println(s.toString());
        return s;
    }

My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):

f
fn
fnu
n
nu
u

But when I try 4 letter words, it leaves something out, as in catq gives me:

a
ac
acq
acqt
c
cq
cqt
q
qt
t

i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.

EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string. I hope I have made my problem a bit clearer

Luca Giorgi
  • 910
  • 3
  • 14
  • 28
  • 6
    This is very similar to finding a [power set](http://en.wikipedia.org/wiki/Power_set). There are many algorithms for finding powersets, you should look into that. – David says Reinstate Monica Feb 18 '15 at 22:48
  • 7
    Doesn't work for 3s either. You have "fn" but not "fu" – Lee Daniel Crocker Feb 18 '15 at 22:51
  • Just a general advice: Step-by-step debugging can be very helpful for these kind of logical errors. – runDOSrun Feb 18 '15 at 22:51
  • Not always- step-by-step debugging is great finding out why it's doing stuff it shouldn't. Not so good when finding out why it's not doing something it should. – Flynn1179 Feb 18 '15 at 22:52
  • 1
    Anagram would be a better word than substring – Cole Tobin Feb 18 '15 at 22:55
  • The mathematical term is "unique combinations of a multiset". – Lee Daniel Crocker Feb 18 '15 at 23:04
  • Ok, after a multitude of comments.. not going to give you the solution, as it's coursework, but consider that the number of results you should get is 2^n, where n is the number of letters. I'm including an empty string as one of the results in that. When's your coursework due to be handed in? I'll post an answer a day after that :) – Flynn1179 Feb 18 '15 at 23:05
  • @Flynn1179, in two weeks time, but I really want to understand it on my own! – Luca Giorgi Feb 18 '15 at 23:09

4 Answers4

1

It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.

(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)

After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.

Community
  • 1
  • 1
mk.
  • 11,360
  • 6
  • 40
  • 54
  • 1
    as algorithmically defined, those would be equivalent starting strings. – aruisdante Feb 18 '15 at 22:53
  • 1
    No, he alphabetises the output. – nneonneo Feb 18 '15 at 22:53
  • 1
    Now I think about it, I'm not so sure. Nobody said anything about finding all anagrams, it's only about finding substrings, and sorting the letters in the result. Although I think it should have been 'actq' or something, to get 'act' as a substring.. well, any permutation with Q at the end. – Flynn1179 Feb 18 '15 at 22:53
  • @Flynn1179: `catq` -> `cat` -> `act`; remember he's sorting the output. – nneonneo Feb 18 '15 at 22:56
  • Flynn, the order of the characters is not important in what I'm trying to do, and I order them in alphabetical order. catq should have all the 3 character variations, meaning [cat, caq, ctq, atq]. for me cat is the equivalent of act or tac. – Luca Giorgi Feb 18 '15 at 22:58
  • 2
    Right, but `acqt` -> `acq` -> `acq`. He needs to stop sorting the **input** if he wants proper substrings – mk. Feb 18 '15 at 22:58
  • Yeah, I just saw that myself, it looks as if the input's being sorted, not the output. – Flynn1179 Feb 18 '15 at 22:58
  • 1
    @mk.: Ahahaha, you're right. That's the bug. (Can you edit to add that? Then I can retract a downvote :P) – nneonneo Feb 18 '15 at 22:59
  • @mk., My problem would still be that I don't get all the possible 3 character words derived from acqt though, wouldn't it? – Luca Giorgi Feb 18 '15 at 23:00
  • 1
    all the possible words derivable from acqt include tqc tqa acq tcq qct ... but that's a different question, and doesn't seem to be what your code is intending to do – mk. Feb 18 '15 at 23:02
  • Yeah, but he's saying TCQ and QCT are equivalent, because they contains the same subset of letters. I think 'substrings' is a bad choice of words, it sounds like what's needed is all possible subsets of letters from the input. – Flynn1179 Feb 18 '15 at 23:03
  • 1
    @mk. that's exactly my problem, it's what I'd like my code to do – Luca Giorgi Feb 18 '15 at 23:03
1

Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.

Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:

000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.

Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.

It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:

public string getSubSet(string input, int index) {
  // Should check that index >=0 and < 2^input.length here.
  // Should also check that input.length <= 31.
  string returnValue = "";
  for (int i = 0; i < input.length; i++) {
    if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
      returnValue += input[i];
  }
  return returnValue;
}

Then, if you need to you can just do a loop that calls this function, like this:

for (i = 1; i < (1 << input.length); i++)
  getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.

Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.

Flynn1179
  • 11,925
  • 6
  • 38
  • 74
0

This is the method I came up with, seems like it's working

private void subsets(String word, ArrayList<String> subset){
        if(word.length() == 1){
            subset.add(word);
            return;
        } 
        else {
            String firstChar = word.substring(0,1);
            word = word.substring(1);
            subsets(word, subset);
            int size = subset.size();
            for (int i = 0; i < size; i++){
                String temp = firstChar + subset.get(i);
                subset.add(temp);
            }
            subset.add(firstChar);
            return;
        }
    }

What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.

When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again. I.E., with the word funthis happens:

f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return

I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.

Luca Giorgi
  • 910
  • 3
  • 14
  • 28
0

This is working code:

public static void main(String[] args) {
    String input = "abcde";
    Set<String> returnList = permutations(input);
    System.out.println(returnList);
}

private static Set<String> permutations(String input) {
    if (input.length() == 1) {
        Set<String> a = new TreeSet<>();
        a.add(input);
        return a;
    }
    Set<String> returnSet = new TreeSet<>();

    for (int i = 0; i < input.length(); i++) {
        String prefix = input.substring(i, i + 1);
        Set<String> permutations = permutations(input.substring(i + 1));
        returnSet.add(prefix);
        returnSet.addAll(permutations);
        Iterator<String> it = permutations.iterator();
        while (it.hasNext()) {
            returnSet.add(prefix + it.next());
        }
    }
    return returnSet;
}
Wadi
  • 29
  • 5
  • 1
    I'm tempted to down vote this simply because you didn't read the question. He doesn't want code, he needs to understand how to do it. Code with no explanation is pretty much exactly what he doesn't want. There's a difference between answering a question, and providing a solution to a problem. Not all answers are solutions. – Flynn1179 Feb 19 '15 at 10:23