2

If I have a piece of text around 3000 characters. I want search for strings with certain characteristics for example strings like [*].

That is, I want to get [a] and [bc] from

sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]

I know there is an algorithm called KMP that guarantee a linear time searching operation through a text, but here I don't have a fixed string to be found, maybe I have to use some regular expression at some place.

How can I do this better than O(n^2)? Is there any light libraries for this if I'm using java?

zonyang
  • 828
  • 3
  • 10
  • 24

2 Answers2

6

No libraries needed, you've effectively described a use case for regex! They are highly optimized for searching, and in this case will be O(n).

String str = "sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]";
List<String> allMatches = new ArrayList<>();
Matcher m = Pattern.compile("\\[[^\\]]*]").matcher(str);
while (m.find()) {
    allMatches.add(m.group());
}

Regex Demo

If you have any doubts though and really want some O(n) that you can see, here's an algorithm:

String str = "sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]";
List<String> allMatches = new ArrayList<>();
for (int i = str.indexOf('['), j; i != -1; i = str.indexOf('[', j + 1)) {
    j = str.indexOf(']', i + 1);
    // if `j` is -1, the brackets are unbalanced. Perhaps throw an Exception?
    allMatches.add(str.substring(i, j + 1));
}
Community
  • 1
  • 1
4castle
  • 32,613
  • 11
  • 69
  • 106
0

Here's how to do it in one line:

String[] hits = str.replaceAll("^.*?\\[|][^\\]]*$", "").split("].*?\\[");

This works by stripping off leading and trailing chars up to and including the first/last opening/closing square bracket, then splits on a close bracket to the next opening bracket (inclusive).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Nice! Are you sure of the performance? The lazy quantifiers seem like they could be improved. – 4castle Jun 18 '16 at 04:59
  • @4castle performance? My guess is this would execute in about 10 microseconds, which is "fast enough". But think of the developer performance too. Less code means less bugs, and less time to write it. – Bohemian Jun 18 '16 at 07:43