4

I have a two dimensional string array look like this: enter image description here
The first column contains characters of many strings, other columns are extra data for character.
I want to search a string (maybe change to array character) in this array to get all match indexes (start - end). For example, when I search with key "next", the result should be [5 - 8], [13 - 16] (the highlight parts in image above).
Shortly, I need a method look like this:

  public static List<Interval> search(String searchText, String[][] data, int columnsCount, int rowCount){
      // Convert search text to String array
      String[] searchArr = getStringArray(searchText);
      // then search in data

  }
  
  // where Interval is:
  public class Interval{
       public int start;
       public int end;
  }   

Is there any fast way to search like this,cause my data is very large?
Thanks in advance!

Community
  • 1
  • 1
ductran
  • 10,043
  • 19
  • 82
  • 165
  • One of the most effective search is `BinarySearch`. `Red Black Tree` or `AVL Tree` can be implemented for more effective search. – erencan Nov 29 '13 at 06:50
  • 2
    There are a whole bunch of [String searching algorithms](http://en.wikipedia.org/wiki/String_searching_algorithm). – Domi Nov 29 '13 at 06:51
  • 1
    Also, note that if your array contains [letter1, data, data, data..., letter2, data...], you will get a bad cache hit rate, which is essential for performance when working with large data sets. Try to re-arrange your data as [letter1, letter2, ... letterN, data, data, ...]. [Here is why.](http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code) (don't mind that he is talking about C++, this applies to all languages). – Domi Nov 29 '13 at 06:54

4 Answers4

3

I would recommend to adapt the String[][] to a CharSequence. Then you are free to do everything you can do with a CharSequence and this also means that you can use java.util.regex.Matcher to search for the string and you don't need to implement an own search algorithm.

For example:

public class Main {
    public static void main(String[] args) {
        String[][] array2d = createArray();

        int charSeqColumn = 0;
        CharSequence charSequnce = new Array2DColumnCharSequnce(array2d, charSeqColumn);

        System.out.println(charSequnce.toString());

        Pattern patttern = Pattern.compile("ext");
        Matcher matcher = patttern.matcher(charSequnce);

        while (matcher.find()) {
            String matchGroup = matcher.group();
            int start = matcher.start();
            int end = matcher.end() - 1;

            String msg = MessageFormat.format("{0} matched at: [{1}] - [{2}]", matchGroup, start, end);
            System.out.println(msg);
        }
    }

    private static String[][] createArray() {
        String[][] array2d = new String[2][10];
        array2d[0][0] = "N";
        array2d[0][1] = "e";
        array2d[0][2] = "x";
        array2d[0][3] = "t";
        array2d[0][4] = " ";
        array2d[0][5] = "N";
        array2d[0][6] = "e";
        array2d[0][7] = "x";
        array2d[0][8] = "t";
        array2d[0][9] = " ";

        array2d[1][0] = "H";
        array2d[1][1] = "e";
        array2d[1][2] = "l";
        array2d[1][3] = "l";
        array2d[1][4] = "o";
        array2d[1][5] = "W";
        array2d[1][6] = "o";
        array2d[1][7] = "r";
        array2d[1][8] = "l";
        array2d[1][9] = "d";
        return array2d;
    }
}

will output

Next Next 
ext matched at: [1] - [3]
ext matched at: [6] - [8]

I would implement the CharSequence adaption like this

class Array2DColumnCharSequnce implements CharSequence {

    private int column;
    private String[][] array2d;
    private int endIndex;
    private int startIndex;

    public Array2DColumnCharSequnce(String[][] array2d, int column) {
        this(array2d, column, 0, array2d[column].length);
        this.array2d = array2d;
        this.column = column;
    }

    public Array2DColumnCharSequnce(String[][] array2d, int column,
            int startIndex, int endIndex) {
        this.array2d = array2d;
        this.column = column;
        this.startIndex = startIndex;
        this.endIndex = endIndex;
    }

    public int length() {
        return endIndex - startIndex;
    }

    public char charAt(int index) {
        String charString = array2d[column][startIndex + index];
        return charString.charAt(0);
    }

    public CharSequence subSequence(int start, int end) {
        Array2DColumnCharSequnce array2dColumnCharSequnce = new Array2DColumnCharSequnce(
                array2d, column, start, end);
        return array2dColumnCharSequnce;
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder(this);
        return sb.toString();
    }
}

Note: The Array2DColumnCharSequnce is just a quick implementation and it does not address exception handling yet nor it addresses what happens when there are more than one char in a string column.

Why to use a CharSequence decorator

The difference with adapting the array to a CharSequence to other approaches is that you use a standard java interface that can be re-used with many other classes and thus is very flexible.

Some often used standard java classes that take a CharSequence as parameter

See the full list here.

Use the code above and try this to see how flexibe the decorator is.

public static void main(String[] args) {
    String[][] array2d = createArray();

    CharSequence charSequnce = new Array2DColumnCharSequnce(array2d, 0);

    boolean contentEquals = "Next Next ".contentEquals(charSequnce);
    System.out.println(contentEquals);

    CharSequence column1CharSequnce = new Array2DColumnCharSequnce(array2d, 1);
    String replaced = "I want to say Next Next ".replace(charSequnce, column1CharSequnce);
    System.out.println(replaced);
}

will output

true
I want to say HelloWorld

Finally everyone has to decide what he/she wants and what fits the situation. I prefer implementations that give me more options if I can get them "almost" for free.

René Link
  • 48,224
  • 13
  • 108
  • 140
  • How can I get more than one match strings by this way? – ductran Nov 29 '13 at 07:54
  • Thanks, I wonder do we need a `CharSequence`? I just export my first column to a string, then use `Matcher` to find, just like @Trying said, it is similar to search a subString in a String. For example, I search `next` in `abcdnextponexnextpour` by using Matcher and regex and got the same result. Do you think there is something wrong with this way? – ductran Nov 30 '13 at 10:38
  • @R4j The short answer: no we don't need a `CharSequence`. I wanted to show an object oriented way of implementing your issue. My approach is a decorator pattern on the 2d array. The only advantage of this is that I don't need to create a new string object and copy the chars. But like I said before... I just wanted to show an oo way. I think that there is nothing wrong with your way. – René Link Nov 30 '13 at 21:49
  • How to detect full location of character? Like "e matched at [0][5]" or "f matched at [1][8]"? – AloneInTheDark Mar 04 '14 at 14:40
1

It is similar to search a subString in a String.

e.g.

A B C D N E X T J H  J  N   E   N   E   X   T   O 

0 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17

So the answer should be [4-7] and [13-16].

public static List<Integer> findIndexes(String source, String toFind){
    List<Integer> list = new LinkedList<Integer>();//it will return the starting indexes of the found substring, we can easily find the end e=index by adding the length of the other. 
    int start = 0;
    while(start < source.length()){
        if(source.charAt(start)==toFind.charAt(0)){//if the char is same then find whether the whole toFind string is present or not.
            if(isMatch(source, toFind, start)){//if it is found than increment the source pointer to the end after the toFind string
                list.add(start);
                start = start+toFind.length();
                continue;
            }
        }
        start++;
    }
    return list;
}
private static boolean isMatch(String s1, String s2, int srcIndex){
    int desIndex = 0;
    while(desIndex<s2.length() && s1.charAt(srcIndex)==s2.charAt(desIndex)){
        srcIndex++;
        desIndex++;
    }
    if(desIndex==s2.length()){
        return true;
    }
    return false;
}

And sample driver program:

public static void main(String[] args) {    
        String s1="abcdnextponexnextpour";
        String s2 = "next";
        List<Integer> list = findIndexes(s1, s2);
        for(int i : list){
            System.out.println(i);
        }
    }

It will output the indexes:

4
13

i.e. you can add the length of the toFind String to calculate the last index.

Trying
  • 14,004
  • 9
  • 70
  • 110
0

I would implement search as follows -

public static List<Interval> search(
    String searchText, String[][] data) {
  List<Interval> al = new ArrayList<>();
  if (searchText != null) {
    searchText = searchText.trim().toUpperCase();
    char[] toMatch = searchText.toCharArray();
    for (int i = 0; i < data.length; i++) {
      if (data[i] != null && data.length > i
          && data[i].length > 0
          && data[i][0].charAt(0) == toMatch[0]) {
        boolean matched = true;
        for (int t = 1; t < toMatch.length; t++) {
          if (i + t > data.length
              || data[i + t][0].charAt(0) != toMatch[t]) {
            i += (t - 1);
            matched = false;
            break;
          }
        }
        if (matched) {
          Interval interval = new Interval();
          interval.start = i - 1;
          interval.end = interval.start + (toMatch.length - 1);
          al.add(interval);
        }
      }
    }
  }
  return al;
}

And, I would modify Interval to add a toString() like this

public String toString() {
  return String.valueOf(start) + "-" + end;
}

Finally, to test it I would use this main method.

public static void main(String[] args) {
  String[][] test = { { "N" }, { "A" }, { "N" },
      { "A" }, { "T" }, { "A" }, { "N" }, { "E" },
      { "X" }, { "T" }, { "E" }, { "R" }, { "N" },
      { "B" }, { "N" }, { "E" }, { "X" }, { "T" } };
  List<Interval> al = search("next", test);
  for (Interval i : al) {
    System.out.println(i);
  }
}

And I did receive this output -

5-8
13-16
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
0

This is your solution:

   void main(String a[][],String k){
   String m="";
   for(int i=0;i<a.length;i++)
   m+=a[i][0];
   int n=0,x;
   while(n<m.length()){
   n=m.indexOf(k,n);
   x=n+k.length();
   System.out.println(n+"-"+x);
   n=x;
   }
   }
   void main(String a[][],char k){
   for(int i=0;i <a.length;i++)
   if(a[i][0]==k)System.out.println(i);
   }

it extracts the first strings of the dda and searches it. you may generate the value n and x as class interval and include it in list.

aniruddha
  • 61
  • 5