-2

I have a list with the following keywords:

["mark", "anthony", "joseph smith", "michael",...]

I want to create a regex delimited by a | so that I have something like:

mark|anthony|joseph smith|michael

I'm doing this like...

StringJoiner regex = new StringJoiner("|");
words.forEach(regex::add);
String matcher = regex.toString();

However there is an issue when I use this approach. If I have the following strings I want to match on:

"josephsmith is unique"
"joseph smith is unique"

I want both scenarios to match. What can I add to my code to ignore the whitespace?

I was thinking to maybe use replace(" ", ...) on every keyword to replace the spaces with some sort of regex but I'm not sure how that would work with escaping characters.

buydadip
  • 8,890
  • 22
  • 79
  • 154

2 Answers2

2

Convert your regex string to

mark|anthony|joseph\s*smith|michael

i.e. replace all spaces with \s* to match zero or more whitespace characters. If you want to match ONLY space (0x20) then make it

mark|anthony|joseph *smith|michael
Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
1

In addition to the answer of @Jim Garrison, if you want to apply the mentionted regex to every keyword you could do that with Stream from :

String regex = words.stream()
    .map(word -> word.replaceAll("\\s+", "\\s*"))
    .collect(Collectors.joining("|"));

The \s+ matches 1 or more white spaces and replaces them with the literal string \s*. (In the final regex \s* will match 0 or more white spaces).

Also note that replaceAll() accepts a regex as the first argument where as replace() will not.

Lino
  • 19,604
  • 6
  • 47
  • 65