I am doing the following programming exercise: String searching with wildcard. The statement is:
The method below, is the most simple string search algorithm. It will find the first occurrence of a word in a text string.
haystack = the whole text
needle = searchword
wildcard = _
find("strike", "i will strike down upon thee"); // return 7
The find method is already made.
The problem is to implement wildcard(s) in the needle. If you have a _ in the needle it will match any character in the haystack.
A normal string search algorithm will find the first occurrence of a word(needle) in a text(haystack), starting on index 0. Like this:
find("strike", "I will strike down upon thee"); return 7
A wildcard in the needle will match any character in the haystack. The method should work on any types of needle, and haystasks. You can assume the needle is shorter than(or equal to) the haystack.
find("g__d", "That's the good thing about being president"); // return 11
If no match the method should return -1
We have written the following code:
import java.util.regex.*;
public class SearchEngine {
static int find(String needle, String haystack){
System.out.println("needle: "+needle);
System.out.println("haystack: "+haystack);
String regex = needle.replace("_",".");
if(regex.equals(needle)){
return haystack.indexOf(needle);
}
System.out.println("regex: "+regex);
Matcher m = Pattern.compile(regex).matcher(haystack);
int pos = -1;
if(m.find()){
pos = m.start();
}
System.out.println("pos: "+pos);
return pos;
}
}
We have found a curious test where it does not pass.Being the test cases:
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class WildsTest {
String haystack = "Once upon a midnight dreary, while I pondered, weak and weary";
@Test
public void normalSearchTest(){
assertEquals(0,SearchEngine.find("Once", haystack));
assertEquals(12, SearchEngine.find("midnight", haystack));
assertEquals(-1, SearchEngine.find("codewars", haystack));
}
@Test
public void wildSearchTest(){
assertEquals(5, SearchEngine.find("_po_", haystack));
assertEquals(12, SearchEngine.find("___night", haystack));
assertEquals(3, SearchEngine.find("___4$&%$--___", "-..,.44$&%$--,.,"));
}
}
It fails in the last case:
needle: ___4$&%$--___
haystack: -..,.44$&%$--,.,
regex: ...4$&%$--...
pos: -1
Why is the reason that the regex does not match "...4$&%$--..." inside "-..,.44$&%$--,.,"?
We have also read:
- Can Java String.indexOf() handle a regular expression as a parameter?
- https://howtodoinjava.com/regex/match-any-set-of-characters/
- Wildcard matching in Java
- Java string matching with wildcards
EDIT:
We have followed @Alex suggestion, and tried to use Pattern.quote:
import java.util.regex.*;
public class SearchEngine {
static int find /**/ (String needle, String haystack){
System.out.println("needle: "+needle);
System.out.println("haystack: "+haystack);
String regex = needle.replace("_",".");
if(regex.equals(needle)){
return haystack.indexOf(needle);
}
System.out.println("regex: "+regex);
String quotedRegex = Pattern.quote(regex);
System.out.println("quotedRegex: "+quotedRegex);
Matcher m = Pattern.compile(quotedRegex).matcher(haystack);
int pos = -1;
if(m.find()){
pos = m.start();
}
System.out.println("pos: "+pos);
return pos;
}
}
However we have found the following trace:
needle: _po_
haystack: Once upon a midnight dreary, while I pondered, weak and weary
regex: .po.
quotedRegex: \Q.po.\E
pos: -1
expected:<5> but was:<-1>
How could we use Pattern.quote to achieve searching with wildcards?
In addition we have followed @s.fuhrm suggestion and replaced the characters with special meaning, in this case the $, with "\\$"
import java.util.regex.*;
public class SearchEngine {
static int find /**/ (String needle, String haystack){
System.out.println("needle: "+needle);
System.out.println("haystack: "+haystack);
String regex = needle.replace("_",".");
if(regex.equals(needle)){
return haystack.indexOf(needle);
}
System.out.println("regex: "+regex);
Matcher m = Pattern.compile(regex.replace("$","\\$")).matcher(haystack);
int pos = -1;
if(m.find()){
pos = m.start();
}
System.out.println("pos: "+pos);
return pos;
}
}
Being this the code which passes the tests.