0

I have some arraylist string with keyword inside like that !

A windows is arraylist string with keyword is bold Struct of window : 9 words before + keyword + 9 words after

You can see some window overlaping alt text

How to i combine that arraylist to receive like that :

alt text

Thanks

Community
  • 1
  • 1
tiendv
  • 2,307
  • 7
  • 23
  • 34

2 Answers2

4

If you're not too worried about performance, a simple subList/equals matching is very easy to write:

    String[] texts = {
        "sunset lake michigan michigan alaska water florida "
        + "peninsula third largest water seventh largest water "
        + "percentage edit list largest country",

        "michigan alaska water florida peninsula third largest water "
        + "seventh largest water percentage edit list largest country "
        + "subdivision list political",

        "third largest water seventh largest water percentage edit list "
        + "largest country subdivision list political geographic "
        + "subdivisions total edit references"
    };
    List<String> joined = new ArrayList<String>();
    for (String text : texts) {
        List<String> textAsList = Arrays.asList(text.split(" "));
        final int N = joined.size();
        final int M = textAsList.size();
        for (int k = Math.min(N, M); k >= 0; k--) {
            if (joined.subList(N - k, N).equals(textAsList.subList(0, k))) {
                joined.addAll(textAsList.subList(k, M));
                break;
            }
        }
    }
    System.out.println(joined);

This prints:

[sunset, lake, michigan, michigan, alaska, water, florida,
peninsula, third, largest, water, seventh, largest, water,
percentage, edit, list, largest, country, subdivision, list,
political, geographic, subdivisions, total, edit, references]

The algorithm works as it says: to build List<String> joined, given a List<String> textAsList, we find the longest subList matching between the "tail" of joined and the "head" of textAsList.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • I have document with some keyword : My goal is find some textchunks have keyword inside : To do this a have some step : Fist : i find a window is arraylist string content : 9 words befor keyword + keyword + 9 words after keyword. for Each keyword. after this step : we have some window (arraylist string ) of keyword , window may be overlaping Second i must combine overlaping windown after this step will have textchunks that content keyword . Problem here is if windown do't ovelaping , it still add to ListString joined . I have some keyword after jon i must recive Thanks in advandce – tiendv May 25 '10 at 08:09
  • @tiendv: I have no idea what you're trying to say. Edit the question with more information for the benefit of everyone trying to help. Give examples to illustrate the different cases. The more the better. Also give bounds on operating parameters, because the best algorithm for this would be quite complicated, but an easy but practical solution seems to exist. – polygenelubricants May 25 '10 at 08:18
0

See How to Use Editor Panes and Text Panes and these examples using DefaultHighlighter.

Addendum: Ah, I thought you just needed the view. For the model, consider the Knuth–Morris–Pratt algorithm, discussed in this answer.

Community
  • 1
  • 1
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
  • I don't mention about how to show it and combine it in screen. I mean i have some arrayList string like win1,win2, win3 . How can i combine win if it have overlaping ! Thanks – tiendv May 25 '10 at 03:18
  • @tiendv: Amended, but it looks like @polygenelubricants may have a good idea; this problem reminds me of matching overlapping gene sequences. – trashgod May 25 '10 at 03:41
  • @thrasgod: yes, it does remind me of that too, and I was about to suggest that the most state-of-the-art algorithm probably involves suffix tree/suffix arrays. – polygenelubricants May 25 '10 at 03:51