0

I want to search a string even if the format does not match For example "Apple" is the string I am trying to search, but i input "apple" or "ápple" or "Applé". I want to check every character that i input, if it does not match the String that i amm trying to search it will replace it until i get the string "Apple".

  • 5
    This makes no sense. Just force the user to input Apple since you will replace to that anyway. – Tunaki Feb 26 '16 at 21:32
  • @user3659052 my prof says it will make it more user friendly in searching. Like Google, if you misspelled the word it will correct the misspelled word and give you the right answer. but any suggestion would be nice – Robin Bernardo Feb 26 '16 at 21:38
  • 1
    Possible duplicate of [What is the equivalent of stringByFoldingWithOptions:locale: in Java?](http://stackoverflow.com/questions/21489289/what-is-the-equivalent-of-stringbyfoldingwithoptionslocale-in-java) – twernt Feb 27 '16 at 02:37

2 Answers2

2

You may be interested in the following code that looks for a normalized string inside a larger normalized string using a java.text.Normalizer:

This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms.

Sample code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.text.Normalizer;

class Ideone
{
    public static void main(String[] args) {
        String haystack[] = {"Apple","Apple","Apple"}; // sample input strings
        String needle[] = {"ápple", "apple", "Applé"}; // sample keywords
        for (int i = 0; i < haystack.length; i++) {    // loop through inputs
            System.out.println(
                find(
                      normalize(haystack[i]),         // get the normalized form of input
                      normalize(needle[i])            // get the normalized form of the keyword
                )
            );
        }
    }

    public static String normalize(String s) {       // Get the string without diacritics
        return Normalizer.normalize(s, Normalizer.Form.NFD).replaceAll("\\p{Mn}", "");
    }

    // Checks if a string contains another in a case-insensitive way
    public static boolean find(String haystack, String needle) {  
        Pattern p = Pattern.compile(needle,  Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(haystack);
        if (m.find()) {
            return true;
        } else {
            return false;
        }

    }
}
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This is very close to what im trying to do. Thankyou! @Wiktor Stribiżew im using Sqlite and im getting the data of the name even if the user is trying to search a wrong input it will still give you the closest data. – Robin Bernardo Feb 26 '16 at 21:52
  • If *closest* means *similar*, you should really start looking for a Levenstein distance based solution. If you are interested in just diacritic-free comparison, my solution should suffice. – Wiktor Stribiżew Feb 26 '16 at 21:54
  • i think its too complex for me. Im just a beginner im doing a hard coded switch case. Any basic advice ? @Wiktor Stribiżew – Robin Bernardo Feb 26 '16 at 22:46
  • Your question is now closed. To reopen it, you need to update it with your non-working code and explain what exactly does not work and how it should work. – Wiktor Stribiżew Feb 27 '16 at 07:51
1

Your question isn't entirely clear, but it sounds like you might be trying to calculate the Levenshtein Distance between two strings. If you do a little research on that term, it should be clear whether or not that's what you need.

In short:

The Levenshtein distance is the number of deletions, insertions, or substitutions required to transform "ápple" into "Apple".

Kevin Rak
  • 336
  • 2
  • 14