12

I'm trying to find all occurrences of a substring in a string in Java.

For example: searching "ababsdfasdfhelloasdf" for "asdf" would return [8,17] since there are 2 "asdf"'s, one at position 8 and one at 17. Searching "aaaaaa" for "aa" would return [0,1,2,3,4] because there is an "aa" at positions 0,1,2,3, and 4.

I tried this:

public List<Integer> findSubstrings(String inwords, String inword) {
    String copyOfWords = inwords;
    List<Integer> indicesOfWord = new ArrayList<Integer>();
    int currentStartIndex = niwords.indexOf(inword);
    int indexat = 0;
    System.out.println(currentStartIndex);
    while (cthing1 > 0) {
        indicesOfWord.add(currentStartIndex+indexat);
        System.out.println(currentStartIndex);
        System.out.println(indicesOfWord);
        indexat += cthing1;
        copyOfWords = copyOfWords.substring(cthing1);
        System.out.println(copyOfWords);
        cthing1 = copyOfWords.indexOf(inword);
    }

This problem can be solved in Python as follows:

indices = [m.start() for m in re.finditer(word, a.lower())]

where "word" is the word I'm looking for and "a" is the string I'm searching through.

How can I achieve this in Java?

pants
  • 192
  • 13
Kevin
  • 708
  • 2
  • 9
  • 20
  • I think the top post [here](http://stackoverflow.com/questions/767759/occurrences-of-substring-in-a-string) may help you. For getting the indexes, just print or save the `lastIndex` as you receive them. – BrockLee Sep 25 '15 at 18:25
  • 2
    Do you mean you need [something like this](http://ideone.com/9IeCEQ)? – Wiktor Stribiżew Sep 25 '15 at 18:27
  • 1
    Please use more meaningful variable names. It's hard to understand what `cthing1` or `outthing` or `niwords` mean. Using things like `lastIndex`, `indexList`, etc. Would make it easier to understand what you wrote and correct it. – RealSkeptic Sep 25 '15 at 18:43

2 Answers2

15

You can use capturing inside a positive look-ahead to get all overlapping matches and use Matcher#start to get the indices of the captured substrings.

As for the regex, it will look like

(?=(aa))

In Java code:

String s = "aaaaaa";
Matcher m = Pattern.compile("(?=(aa))").matcher(s);
List<Integer> pos = new ArrayList<Integer>();
while (m.find())
{
    pos.add(m.start());
}
System.out.println(pos);

Result:

[0, 1, 2, 3, 4]

See IDEONE demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
5

Using a regex is definitely an overly heavy solution for finding substrings, and it'll especially be a problem if your substring contains special regex characters like .. Here's a solution adapted from this answer:

String str = "helloslkhellodjladfjhello";
String findStr = "hello";
int lastIndex = 0;
List<Integer> result = new ArrayList<Integer>();

while(lastIndex != -1) {

    lastIndex = str.indexOf(findStr,lastIndex);

    if(lastIndex != -1){
        result.add(lastIndex);
        lastIndex += 1;
    }
}
Community
  • 1
  • 1
Alex Hall
  • 34,833
  • 5
  • 57
  • 89