47

I need a comparator in java which has the same semantics as the sql 'like' operator. For example:

myComparator.like("digital","%ital%");
myComparator.like("digital","%gi?a%");
myComparator.like("digital","digi%");

should evaluate to true, and

myComparator.like("digital","%cam%");
myComparator.like("digital","tal%");

should evaluate to false. Any ideas how to implement such a comparator or does anyone know an implementation with the same semantics? Can this be done using a regular expression?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Chris
  • 15,429
  • 19
  • 72
  • 74
  • 1
    See [RegexUtil#sqlPatternToRegex(String)](https://github.com/apache/cayenne/blob/master/cayenne-server/src/main/java/org/apache/cayenne/util/RegexUtil.java#L76) from Apache Cayenne project. – Volkan Yazıcı Jun 26 '18 at 20:50

18 Answers18

38

.* will match any characters in regular expressions

I think the java syntax would be

"digital".matches(".*ital.*");

And for the single character match just use a single dot.

"digital".matches(".*gi.a.*");

And to match an actual dot, escape it as slash dot

\.
Bob
  • 97,670
  • 29
  • 122
  • 130
  • yeah, thanks! But in case the word ins't so simple like "%dig%" and the string needs some escping? Is there anything already exsiting? What about the '?' ? – Chris May 22 '09 at 15:21
  • I edited my answer for the question mark operator. I am a little confused by the rest of your comment though. Are you saying the string is coming to you in sql syntax and you want to evaluate it as is? If that is the case I think you will need to replace to sql syntax manually. – Bob May 22 '09 at 15:24
  • what if the string which is used as a search pattern contains grouping characters like '(' or ')' escape them too? how mayn other characters needs escaping? – Chris May 22 '09 at 15:25
  • I think that will depend on how many options you are allowing. – Bob May 22 '09 at 15:25
  • 1
    Just beware that .* is greedy(.*? might be more approriate). I don't think .* in regex is exactly the same semantics as % in SQL. – GreenieMeanie May 22 '09 at 15:47
  • That is a good point, see this question for an explination http://stackoverflow.com/questions/255815/how-can-i-fix-my-regex-to-not-match-too-much-with-a-greedy-quantifier – Bob May 22 '09 at 15:50
26

Regular expressions are the most versatile. However, some LIKE functions can be formed without regular expressions. e.g.

String text = "digital";
text.startsWith("dig"); // like "dig%"
text.endsWith("tal"); // like "%tal"
text.contains("gita"); // like "%gita%"
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
25

Yes, this could be done with a regular expression. Keep in mind that Java's regular expressions have different syntax from SQL's "like". Instead of "%", you would have ".*", and instead of "?", you would have ".".

What makes it somewhat tricky is that you would also have to escape any characters that Java treats as special. Since you're trying to make this analogous to SQL, I'm guessing that ^$[]{}\ shouldn't appear in the regex string. But you will have to replace "." with "\\." before doing any other replacements. (Edit: Pattern.quote(String) escapes everything by surrounding the string with "\Q" and "\E", which will cause everything in the expression to be treated as a literal (no wildcards at all). So you definitely don't want to use it.)

Furthermore, as Dave Webb says, you also need to ignore case.

With that in mind, here's a sample of what it might look like:

public static boolean like(String str, String expr) {
    expr = expr.toLowerCase(); // ignoring locale for now
    expr = expr.replace(".", "\\."); // "\\" is escaped to "\" (thanks, Alan M)
    // ... escape any other potentially problematic characters here
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    str = str.toLowerCase();
    return str.matches(expr);
}
Michael Myers
  • 188,989
  • 46
  • 291
  • 292
  • exists there a method, which escapes every charachter with special meaning in java regex? – Chris May 22 '09 at 15:28
  • 1
    Yes, Pattern.quote (http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#quote%28java.lang.String%29 ) will do it. For some reason, I thought that might cause a problem, but now I don't know why I didn't include it in the answer. – Michael Myers May 22 '09 at 15:34
  • Oh yes, now I remember. It's because ? is a special regex character, so it would be escaped before we could replace it. I suppose we could instead use Pattern.quote and then expr = expr.replace("\\?", "."); – Michael Myers May 22 '09 at 15:36
  • You are right. I should have tested it on dots before posting it. – Michael Myers Jul 20 '09 at 20:16
  • You can add also `expr = expr.replaceAll("(?<!\\\\)_", ".");`, because `"\_"` can be escaped in SQL, and should not be replaced with `"."` in this case. (I used `_` instead of `?` for one character.) – True Soft Jul 22 '12 at 12:22
  • Also, for `%`, this replacement would be better: `expr = expr.replaceAll("(?<!\\\\)%", ".*");` – True Soft Jul 22 '12 at 14:25
16

Every SQL reference I can find says the "any single character" wildcard is the underscore (_), not the question mark (?). That simplifies things a bit, since the underscore is not a regex metacharacter. However, you still can't use Pattern.quote() for the reason given by mmyers. I've got another method here for escaping regexes when I might want to edit them afterward. With that out of the way, the like() method becomes pretty simple:

public static boolean like(final String str, final String expr)
{
  String regex = quotemeta(expr);
  regex = regex.replace("_", ".").replace("%", ".*?");
  Pattern p = Pattern.compile(regex,
      Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
  return p.matcher(str).matches();
}

public static String quotemeta(String s)
{
  if (s == null)
  {
    throw new IllegalArgumentException("String cannot be null");
  }

  int len = s.length();
  if (len == 0)
  {
    return "";
  }

  StringBuilder sb = new StringBuilder(len * 2);
  for (int i = 0; i < len; i++)
  {
    char c = s.charAt(i);
    if ("[](){}.*+?$^|#\\".indexOf(c) != -1)
    {
      sb.append("\\");
    }
    sb.append(c);
  }
  return sb.toString();
}

If you really want to use ? for the wildcard, your best bet would be to remove it from the list of metacharacters in the quotemeta() method. Replacing its escaped form -- replace("\\?", ".") -- wouldn't be safe because there might be backslashes in the original expression.

And that brings us to the real problems: most SQL flavors seem to support character classes in the forms [a-z] and [^j-m] or [!j-m], and they all provide a way to escape wildcard characters. The latter is usually done by means of an ESCAPE keyword, which lets you define a different escape character every time. As you can imagine, this complicates things quite a bit. Converting to a regex is probably still the best option, but parsing the original expression will be much harder--in fact, the first thing you would have to do is formalize the syntax of the LIKE-like expressions themselves.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • if(s == null) throw new IllegalArgumentException("String cannot be null"); else if(s.isEmpty()) return ""; – Leo Dec 21 '16 at 18:36
8

To implement LIKE functions of sql in java you don't need regular expression in They can be obtained as:

String text = "apple";
text.startsWith("app"); // like "app%"
text.endsWith("le"); // like "%le"
text.contains("ppl"); // like "%ppl%"
Mithun Adhikari
  • 521
  • 6
  • 13
  • 3
    This is essentially just a repeat of [this existing answers posted many years ago](https://stackoverflow.com/a/1149905). – Pang Jun 26 '17 at 01:07
  • 2
    Oh really? And what about if text was "I like apples but not oranges" and the search is something like "%oranges%apples%" – Christian Jun 07 '20 at 15:48
3
public static boolean like(String toBeCompare, String by){
    if(by != null){
        if(toBeCompare != null){
            if(by.startsWith("%") && by.endsWith("%")){
                int index = toBeCompare.toLowerCase().indexOf(by.replace("%", "").toLowerCase());
                if(index < 0){
                    return false;
                } else {
                    return true;
                }
            } else if(by.startsWith("%")){
                return toBeCompare.endsWith(by.replace("%", ""));
            } else if(by.endsWith("%")){
                return toBeCompare.startsWith(by.replace("%", ""));
            } else {
                return toBeCompare.equals(by.replace("%", ""));
            }
        } else {
            return false;
        }
    } else {
        return false;
    }
}

may be help you

Krishnendu
  • 1,289
  • 18
  • 26
3

Java strings have .startsWith() and .contains() methods which will get you most of the way. For anything more complicated you'd have to use regex or write your own method.

job
  • 9,003
  • 7
  • 41
  • 50
3

You could turn '%string%' to contains(), 'string%' to startsWith() and '%string"' to endsWith().

You should also run toLowerCase() on both the string and pattern as LIKE is case-insenstive.

Not sure how you'd handle '%string%other%' except with a Regular Expression though.

If you're using Regular Expressions:

Community
  • 1
  • 1
David Webb
  • 190,537
  • 57
  • 313
  • 299
  • what abot "%this%string%"? split on the '%' sign, iterate over the array and than check for every entry? i think this could be done better ... – Chris May 22 '09 at 15:23
2

Apache Cayanne ORM has an "In memory evaluation"

It may not work for unmapped object, but looks promising:

Expression exp = ExpressionFactory.likeExp("artistName", "A%");   
List startWithA = exp.filterObjects(artists); 
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • do you know if hibernate does support this feature? i mean, to filter objects currently in memory using such an expression? – tommyL Jul 21 '09 at 11:28
2

The Comparator and Comparable interfaces are likely inapplicable here. They deal with sorting, and return integers of either sign, or 0. Your operation is about finding matches, and returning true/false. That's different.

John
  • 1,635
  • 15
  • 22
  • you are welcome to suggest a better name for the operator. i dont like critics without suggestions for improvements, btw. – Chris Aug 18 '09 at 12:58
2

http://josql.sourceforge.net/ has what you need. Look for org.josql.expressions.LikeExpression.

Rich MacDonald
  • 895
  • 6
  • 5
1

i dont know exactly about the greedy issue, but try this if it works for you:

public boolean like(final String str, String expr)
  {
    final String[] parts = expr.split("%");
    final boolean traillingOp = expr.endsWith("%");
    expr = "";
    for (int i = 0, l = parts.length; i < l; ++i)
    {
      final String[] p = parts[i].split("\\\\\\?");
      if (p.length > 1)
      {
        for (int y = 0, l2 = p.length; y < l2; ++y)
        {
          expr += p[y];
          if (i + 1 < l2) expr += ".";
        }
      }
      else
      {
        expr += parts[i];
      }
      if (i + 1 < l) expr += "%";
    }
    if (traillingOp) expr += "%";
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    return str.matches(expr);
}
tommyL
  • 509
  • 4
  • 8
  • Your inner split() and loop replaces any \? sequence with a dot--I don't get that. Why single out that sequence, only to replace it with a dot just like a lone question mark? – Alan Moore Jul 20 '09 at 08:38
  • it replaces the '?' with a '.' because '?' is a place holder for a single arbitrary character. i know '\\\\\\?' looks strange but i testedt it and for my tests it seems to work. – tommyL Jul 20 '09 at 10:32
1

I have a similar requirement, which may help, with some modifications, here is the code:

package codeSamplesWithoutMaven;

public class TestLikeInJava {

public static void main(String[] args) {
    String fromDb = "erick@gmail.com";
    String str1 = "*gmail*";
    String str2 = "*erick";
    String str3 = "*rick";
    String str4 = "*.com";
    String str5 = "erick*";
    String str6 = "ck@gmail*";
    System.out.println(belongsToStringWithWildcards(str1, fromDb));
    System.out.println(belongsToStringWithWildcards(str2, fromDb));
    System.out.println(belongsToStringWithWildcards(str3, fromDb));
    System.out.println(belongsToStringWithWildcards(str4, fromDb));
    System.out.println(belongsToStringWithWildcards(str5, fromDb));
    System.out.println(belongsToStringWithWildcards(str6, fromDb));
}

private static Boolean belongsToStringWithWildcards(String strToTest, String targetStr) {
    Boolean result = Boolean.FALSE;
    int type = 0; //1:start, 2:end, 3:both
    if (strToTest.startsWith("*") && strToTest.endsWith("*")) {
        type = 3;
    } else {
        if (strToTest.startsWith("*")) {
            type = 1;
        } else if (strToTest.endsWith("*")) {
            type = 2;
        }
    }
    System.out.println("strToTest " + strToTest + " into " + targetStr + " type " + type);
    strToTest = strToTest.replaceAll("[*]", "");
    System.out.println("strToTest " + strToTest + " into " + targetStr + " type " + type);
    switch (type) {
        case 1: result = targetStr.endsWith(strToTest);  
                break;
        case 2: result = targetStr.startsWith(strToTest);
                break;
        case 3: result = targetStr.contains(strToTest);
                break;
    }
    return result;
}

}

0
public static boolean like(String source, String exp) {
        if (source == null || exp == null) {
            return false;
        }

        int sourceLength = source.length();
        int expLength = exp.length();

        if (sourceLength == 0 || expLength == 0) {
            return false;
        }

        boolean fuzzy = false;
        char lastCharOfExp = 0;
        int positionOfSource = 0;

        for (int i = 0; i < expLength; i++) {
            char ch = exp.charAt(i);

            // 是否转义
            boolean escape = false;
            if (lastCharOfExp == '\\') {
                if (ch == '%' || ch == '_') {
                    escape = true;
                    // System.out.println("escape " + ch);
                }
            }

            if (!escape && ch == '%') {
                fuzzy = true;
            } else if (!escape && ch == '_') {
                if (positionOfSource >= sourceLength) {
                    return false;
                }

                positionOfSource++;// <<<----- 往后加1
            } else if (ch != '\\') {// 其他字符,但是排查了转义字符
                if (positionOfSource >= sourceLength) {// 已经超过了source的长度了
                    return false;
                }

                if (lastCharOfExp == '%') { // 上一个字符是%,要特别对待
                    int tp = source.indexOf(ch);
                    // System.out.println("上一个字符=%,当前字符是=" + ch + ",position=" + position + ",tp=" + tp);

                    if (tp == -1) { // 匹配不到这个字符,直接退出
                        return false;
                    }

                    if (tp >= positionOfSource) {
                        positionOfSource = tp + 1;// <<<----- 往下继续

                        if (i == expLength - 1 && positionOfSource < sourceLength) { // exp已经是最后一个字符了,此刻检查source是不是最后一个字符
                            return false;
                        }
                    } else {
                        return false;
                    }
                } else if (source.charAt(positionOfSource) == ch) {// 在这个位置找到了ch字符
                    positionOfSource++;
                } else {
                    return false;
                }
            }

            lastCharOfExp = ch;// <<<----- 赋值
            // System.out.println("当前字符是=" + ch + ",position=" + position);
        }

        // expr的字符循环完了,如果不是模糊的,看在source里匹配的位置是否到达了source的末尾
        if (!fuzzy && positionOfSource < sourceLength) {
            // System.out.println("上一个字符=" + lastChar + ",position=" + position );

            return false;
        }

        return true;// 这里返回true
    }
Assert.assertEquals(true, like("abc_d", "abc\\_d"));
        Assert.assertEquals(true, like("abc%d", "abc\\%%d"));
        Assert.assertEquals(false, like("abcd", "abc\\_d"));

        String source = "1abcd";
        Assert.assertEquals(true, like(source, "_%d"));
        Assert.assertEquals(false, like(source, "%%a"));
        Assert.assertEquals(false, like(source, "1"));
        Assert.assertEquals(true, like(source, "%d"));
        Assert.assertEquals(true, like(source, "%%%%"));
        Assert.assertEquals(true, like(source, "1%_"));
        Assert.assertEquals(false, like(source, "1%_2"));
        Assert.assertEquals(false, like(source, "1abcdef"));
        Assert.assertEquals(true, like(source, "1abcd"));
        Assert.assertEquals(false, like(source, "1abcde"));

        // 下面几个case很有代表性
        Assert.assertEquals(true, like(source, "_%_"));
        Assert.assertEquals(true, like(source, "_%____"));
        Assert.assertEquals(true, like(source, "_____"));// 5个
        Assert.assertEquals(false, like(source, "___"));// 3个
        Assert.assertEquals(false, like(source, "__%____"));// 6个
        Assert.assertEquals(false, like(source, "1"));

        Assert.assertEquals(false, like(source, "a_%b"));
        Assert.assertEquals(true, like(source, "1%"));
        Assert.assertEquals(false, like(source, "d%"));
        Assert.assertEquals(true, like(source, "_%"));
        Assert.assertEquals(true, like(source, "_abc%"));
        Assert.assertEquals(true, like(source, "%d"));
        Assert.assertEquals(true, like(source, "%abc%"));
        Assert.assertEquals(false, like(source, "ab_%"));

        Assert.assertEquals(true, like(source, "1ab__"));
        Assert.assertEquals(true, like(source, "1ab__%"));
        Assert.assertEquals(false, like(source, "1ab___"));
        Assert.assertEquals(true, like(source, "%"));

        Assert.assertEquals(false, like(null, "1ab___"));
        Assert.assertEquals(false, like(source, null));
        Assert.assertEquals(false, like(source, ""));
test love
  • 1
  • 1
0

Check out https://github.com/hrakaroo/glob-library-java.

It's a zero dependency library in Java for doing glob (and sql like) type of comparisons. Over a large data set it is faster than translating to a regular expression.

Basic syntax

MatchingEngine m = GlobPattern.compile("dog%cat\%goat_", '%', '_', GlobPattern.HANDLE_ESCAPES);
if (m.matches(str)) { ... }
0

This's my take on this, it's in Kotlin but can be converted to Java with little effort:

val percentageRegex = Regex("""(?<!\\)%""")
val underscoreRegex = Regex("""(?<!\\)_""")

infix fun String.like(predicate: String): Boolean {

    //Split the text by every % not preceded by a slash.
    //We transform each slice before joining them with .* as a separator.
    return predicate.split(percentageRegex).joinToString(".*") { percentageSlice ->

        //Split the text by every _ not preceded by a slash.
        //We transform each slice before joining them with . as a separator.
        percentageSlice.split(underscoreRegex).joinToString(".") { underscoreSlice ->

            //Each slice is wrapped in "Regex quotes" to ignore all
            // the metacharacters they contain.
            //We also remove the slashes from the escape sequences
            // since they are now treated literally.
            Pattern.quote(
                underscoreSlice.replace("\\_", "_").replace("\\%", "%")
            )
        }

    }.let { "^$it$" }.toRegex().matches(this@like)
}

It might not be the most performant of all the solutions here, but it's probably the most accurate.

It ignores all the other Regex metacharacters other than % and _ and also supports escaping them with a slash.

Ahmed Mourad
  • 193
  • 1
  • 2
  • 18
-1

from https://www.tutorialspoint.com/java/java_string_matches.htm

import java.io.*;
public class Test {

   public static void main(String args[]) {
      String Str = new String("Welcome to Tutorialspoint.com");

      System.out.print("Return Value :" );
      System.out.println(Str.matches("(.*)Tutorials(.*)"));

      System.out.print("Return Value :" );
      System.out.println(Str.matches("Tutorials"));

      System.out.print("Return Value :" );
      System.out.println(Str.matches("Welcome(.*)"));
   }
}
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
  • 2
    From [How to reference material written by others](https://stackoverflow.com/help/referencing): "*Do not copy the complete text of sources; instead, use their words and ideas to support your own. In particular, answers comprised entirely of a quote (sourced or not) will often be deleted since they do not contain any original content.*" – Wai Ha Lee Dec 14 '22 at 14:06
-3

Ok this is a bit of a weird solution, but I thought it should still be mentioned.

Instead of recreating the like mechanism we can utilize the existing implementation already available in any database!

(Only requirement is, your application must have access to any database).

Just run a very simple query each time,that returns true or false depending on the result of the like's comparison. Then execute the query, and read the answer directly from the database!

For Oracle db:

SELECT
CASE 
     WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
     ELSE 'false'
 END test
FROM dual 

For MS SQL Server

SELECT
CASE 
     WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
     ELSE 'false'
END test

All you have to do is replace "StringToSearch" and "LikeSequence" with bind parameters and set the values you want to check.

Kinnison84
  • 176
  • 1
  • 8