0

As shown in this question, Python regex has a neat and concise functionality to fuzzy match one string against the start of a second string (up to x character changes).

In the following code snippet, x=1 (see e<=1). The first string is amazing, and the second string is amagingfiller.

>>> import regex
>>> regex.match('(amazing){e<=1}', 'amagingfiller')
<regex.Match object; span=(0, 7), match='amaging', fuzzy_counts=(1, 0, 0)>

amazing matches amaging because amaging is 1 or fewer changes from amazing. filler is ignored entirely. This is what is expected.

Question 1: Is there any equivalent functionality in Java's regex library?

Question 2: If not, what's an alternative way to solve this?

Ian
  • 3,605
  • 4
  • 31
  • 66
  • Why are you calling it "fuzzy split" if you're not splitting anything? Don't you mean "fuzzy match"? – 41686d6564 stands w. Palestine Nov 14 '20 at 16:19
  • Updated to make it clearer. – Ian Nov 14 '20 at 16:21
  • For my use case I want to use `amazing` to split `amagingfiller` into `amaging` and `filler`, hence why I used the term split. Agree "match" makes more sense in the context of the question though. – Ian Nov 14 '20 at 16:22
  • Note that it's not just for strings of the same length. I'd also expect `amazing` to match `amazzing` in `amazzingfiller` (edit distance=1), for instance. – Ian Nov 14 '20 at 16:24
  • 1
    I don't know if there's another regex implementation that offers this functionality but you may consider checking [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance). – 41686d6564 stands w. Palestine Nov 14 '20 at 16:31
  • Thanks, I'm aware of Levenshtein but was hoping there might be an oven ready solution for the problem :) – Ian Nov 14 '20 at 16:34

0 Answers0