Regex match() with javaScript Serbian Latin letters

Question

I need to match text for search on client side. I have:

 const regex = /zitiste/g;

And I need to match it:

const place  = "žitište";
place.match(regex);

This returns false, also on:

https://regex101.com/r/Tk7tKy/2

"zitiste" not match with "žitište"

so it's even posible to match z with ž? Using Regular expression. I read 100 pages with Regular expression but can't catch if this even posible.

I think you have to use the \x00-\x7F range of characters to find non-ascii characters in your string. If I enter [\x00-\x7F]+ into that linked page, it says that there are 2 characters that match - ž and š — ATD, Sep 21 '20 at 15:08
`z` will not match with `ž`, do you want to match both `zitiste` and `žitište`? — Ankit, Sep 21 '20 at 15:08
Yes m8. In my array it's žitište, and I want to match with zitiste and žitište from I got it like this const regex = new RegExp("(?:" + search.join("|") + ")", "gi"); — Milan Djekic, Sep 21 '20 at 15:14
Leaning towards @T.J. Crowder's answer, this might help: https://stackoverflow.com/questions/1453171/remove-diacritical-marks-%c5%84-%c7%b9-%c5%88-%c3%b1-%e1%b9%85-%c5%86-%e1%b9%87-%e1%b9%8b-%e1%b9%89-%cc%88-%c9%b2-%c6%9e-%e1%b6%87-%c9%b3-%c8%b5-from-unicode-chars. Granted it's for Java, but it shouldn't be to hard to borrow the concept. — Gary, Sep 21 '20 at 15:33

T.J. Crowder · Answer 1 · 2020-09-21T15:22:30.133

2

You can match either z or ž (and the same with s/š):

const regex = /[zž]iti[sš]te/gi;

Live Example:

const regex = /[zž]iti[sš]te/gi;
console.log("žitište".match(regex));
console.log("žitiste".match(regex));
console.log("Zitište".match(regex));
console.log("Zitiste".match(regex));

.as-console-wrapper {
    max-height: 100% !important;
}

[zž] means "z or ž."

Obviously you'd include other alternatives for any other letters you wanted to allow both with and without diacritical marks.

I was hoping that you might be able to use the new Unicode property escapes feature to search for anything in Serbian script, but it doesn't look like it gets its own category. :-(

Here's an example where you get the regular expression from an input, loosen it to allow characters either with or without diacritical marks (in this case only the z and s as in your question, but you'll want to add the full list):

// The substitutions to make
const map = {
    "z": "[zž]",
    "ž": "[zž]",
    "s": "[sš]",
    "š": "[sš]",
};
document.getElementById("btn-check").addEventListener("click", function() {
    let rexText = document.getElementById("regex").value;
    rexText = rexText.replace(/[zžsš\\]/g, ch => map[ch] || ch);
    const rex = new RegExp(rexText, "gi");
    const text = document.getElementById("input").value;
    const result = text.match(rex);
    console.log(`Matching text "${text}" against ${rex}: ${result}`);
});

<div>
    <label>
        Regex:
        <input type="text" id="regex" value="zitiste">
    </label>
</div>
<div>
    <label>
        Input to match against:
        <input type="text" id="input" value="žitište">
    </label>
</div>
<input type="button" value="Check" id="btn-check">

edited Sep 21 '20 at 15:22

answered Sep 21 '20 at 15:06

T.J. Crowder

1,031,962
187
1,923
1,875

1

I don't think this is what the OP wants to achieve in reality. The example is just boiled down to the bare minimum ... maybe the OP's Q. already has an answer (the unicode hint) on a similar problem like [RegEx with extended latin alphabet \(ä ö ü è ß\)](https://stackoverflow.com/questions/11704182/regex-with-extended-latin-alphabet-%C3%A4-%C3%B6-%C3%BC-%C3%A8-%C3%9F) – Peter Seliger Sep 21 '20 at 15:10
This was good direction but, const regex = /zitiste/g; is from , and it's dinamic totaly. I got it like this const regex = new RegExp("(?:" + search.join("|") + ")", "gi"); – Milan Djekic Sep 21 '20 at 15:10
1

@MilanDjekic - You could still modify it before applying it, adding in loose matching for letters with and without the diacritics. – T.J. Crowder Sep 21 '20 at 15:13
@T.J. Crowd I will definitely do this if there are no better options. – Milan Djekic Sep 21 '20 at 15:16
1

@MilanDjekic - I've added an example. :-) – T.J. Crowder Sep 21 '20 at 15:22
@T.J. Crowd Thats it m8 :) Thanks a lot. I will give you +1 but I don't have enough reputation points. – Milan Djekic Sep 21 '20 at 15:33
2

@MilanDjekic - No worries, not in it for the rep. :-) (What would be the point, at ~800k? ;-) ) Hope that helps! – T.J. Crowder Sep 21 '20 at 15:35

Regex match() with javaScript Serbian Latin letters

1 Answers1