0

I am doing the following programming exercise: String searching with wildcard. The statement is:

The method below, is the most simple string search algorithm. It will find the first occurrence of a word in a text string.

haystack = the whole text

needle = searchword

wildcard = _

find("strike", "i will strike down upon thee"); // return 7

The find method is already made.

The problem is to implement wildcard(s) in the needle. If you have a _ in the needle it will match any character in the haystack.

A normal string search algorithm will find the first occurrence of a word(needle) in a text(haystack), starting on index 0. Like this:

find("strike", "I will strike down upon thee"); return 7

A wildcard in the needle will match any character in the haystack. The method should work on any types of needle, and haystasks. You can assume the needle is shorter than(or equal to) the haystack.

find("g__d", "That's the good thing about being president"); // return 11

If no match the method should return -1

We have written the following code:

import java.util.regex.*;
public class SearchEngine {
    static int find(String needle, String haystack){
      System.out.println("needle: "+needle);
      System.out.println("haystack: "+haystack);
      String regex = needle.replace("_",".");
      if(regex.equals(needle)){
        return haystack.indexOf(needle);
      }
      System.out.println("regex: "+regex);
      Matcher m = Pattern.compile(regex).matcher(haystack);
      int pos = -1;
      if(m.find()){
        pos = m.start();
      }
      System.out.println("pos: "+pos);
      return pos;
    }
}

We have found a curious test where it does not pass.Being the test cases:

import org.junit.Test;
import static org.junit.Assert.assertEquals;

public class WildsTest {
    String haystack = "Once upon a midnight dreary, while I pondered, weak and weary";    
    @Test
    public void normalSearchTest(){
        assertEquals(0,SearchEngine.find("Once", haystack));
        assertEquals(12, SearchEngine.find("midnight", haystack));
        assertEquals(-1, SearchEngine.find("codewars", haystack));
    }
    @Test
    public void wildSearchTest(){
        assertEquals(5, SearchEngine.find("_po_", haystack));
        assertEquals(12, SearchEngine.find("___night", haystack));
        assertEquals(3, SearchEngine.find("___4$&%$--___", "-..,.44$&%$--,.,"));
    }
 }

It fails in the last case:

needle: ___4$&%$--___
haystack: -..,.44$&%$--,.,
regex: ...4$&%$--...
pos: -1

Why is the reason that the regex does not match "...4$&%$--..." inside "-..,.44$&%$--,.,"?

We have also read:

EDIT:

We have followed @Alex suggestion, and tried to use Pattern.quote:

import java.util.regex.*;
public class SearchEngine {
    static int find /**/ (String needle, String haystack){
      System.out.println("needle: "+needle);
      System.out.println("haystack: "+haystack);
      String regex = needle.replace("_",".");
      if(regex.equals(needle)){
        return haystack.indexOf(needle);
      }
      System.out.println("regex: "+regex);
      String quotedRegex = Pattern.quote(regex);
      System.out.println("quotedRegex: "+quotedRegex);
      Matcher m = Pattern.compile(quotedRegex).matcher(haystack);
      int pos = -1;
      if(m.find()){
        pos = m.start();
      }
      System.out.println("pos: "+pos);
      return pos;
    }
}

However we have found the following trace:

needle: _po_
haystack: Once upon a midnight dreary, while I pondered, weak and weary
regex: .po.
quotedRegex: \Q.po.\E
pos: -1
expected:<5> but was:<-1>

How could we use Pattern.quote to achieve searching with wildcards?

In addition we have followed @s.fuhrm suggestion and replaced the characters with special meaning, in this case the $, with "\\$"

import java.util.regex.*;
public class SearchEngine {
    static int find /**/ (String needle, String haystack){
      System.out.println("needle: "+needle);
      System.out.println("haystack: "+haystack);
      String regex = needle.replace("_",".");
      if(regex.equals(needle)){
        return haystack.indexOf(needle);
      }
      System.out.println("regex: "+regex);

      Matcher m = Pattern.compile(regex.replace("$","\\$")).matcher(haystack);
      int pos = -1;
      if(m.find()){
        pos = m.start();
      }
      System.out.println("pos: "+pos);
      return pos;
    }
}

Being this the code which passes the tests.

Yone
  • 2,064
  • 5
  • 25
  • 56

2 Answers2

2

There are characters in your 'needle' which have special meaning in regex, namely a dollar sign $ which means 'end of line' in regex. You should escape such special characters when making a regex in order to march a literal string. You can use Pattern.quote method to do this.

Alex Sveshnikov
  • 4,214
  • 1
  • 10
  • 26
1

Why is the reason that the regex does not match "...4$&%$--..." inside "-..,.44$&%$--,.,"?

At least $ is a regular expression matching at the end of line. This is not what you're wanting. you need to replace $ with \$ respectively "\\$"

s.fuhrm
  • 438
  • 4
  • 9