I'm looking for a method passing the following test cases:
assertEquals(0, indexOfIgnoreCase("ss", "ß"));
assertEquals(0, indexOfIgnoreCase("ß", "ss"));
assertEquals(1, indexOfIgnoreCase("ßa", "a"));
The funny character (called German "sharp S") is not really exotic (U+00DF, present in Latin-1 Supplement Unicode block), unless you capitalize it: "ß".toUpperCase()
returns "SS"
(locale-independent).
My search for a solution working for at least the first 256 Unicode characters returned nothing but ICU4j, which I don't want to use.
This question (indirectly) asks for a case-insensitive version of String.contains
, but note that most of the answers work for ASCII only. The accepted answer can be adapted like
final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile(Pattern.quote(needle), flags);
final Matcher matcher = pattern.matcher(hay);
return matcher.find() ? matcher.start() : -1;
so it works also for non-ASCII and returns the position instead of a boolean. However, it fails the above tests.
Apache org.apache.commons.lang3.StringUtils
doesn't pass either. This nice answer utilizing String.regionMatches
provides a fast solution, but doesn't pass.
Converting to lowercase wouldn't suffice, converting to uppercase sort of would, but the last test case would return 2 instead of 1.
I'm a bit unsure about what the result of
indexOfIgnoreCase("ßa", "sa")
should be? 0.5
as the "needle" starts at the second S
from the capitalization of ß
?