The German language sometimes uses words with absurd length. As long as they are constant at runtime, this is not an issue, careful designers will just apply word wrap where syntactical rules allow. Now we have an application where labels are user-configurable at runtime. I'm aware of 10 good reasons why this is a bad idea, possible solutions need not address this, it is a business requirement.
We wrote a translation tool which is based on a SQL Server database where both UX team and translators can look up strings by partial token matching, implemented simply as
SELECT Literal FROM Translations WHERE LCID=N'de' AND Literal LIKE N'%'+@token+N'%'
I know that this is not SARGable, at the moment it performs reasonably well so a possible solution does not need to address this.
Translators have come up with the habit of including zero-width spaces "" as hyphenation points in texts like this:
Entgeltfortzahlungsanspruch
You can actually copy and paste the spaces from that string, there are two, one after "Entgelt" and one before "anspruch".
At runtime, this works like a charm. At designtime, it has now become impossible to copy+paste "Entgeltfortzahlungsanspruch" (without the invisible spaces!) into the search box of the tool and get results, because the LIKE
operator does not ignore these spaces. Which has become a bit of a problem because team members keep adding the same string thinking it has not been included in the glossary yet.
Is there a LIKE
syntax jugglery, regex magic, or other ingenious algorithm which would allow to ignore zero-widths spaces in searches, still preserving them in the data?
Previous research: From the four questions that come up when searching for "tsql" and "zero-width space", this one is partly relevant, we actually have implemented part of it because the application can already deal with accent sensitivity/insensitivity. But I could not find any information hinting that collations or TRIM
ming would help with this kind of inner whitespace.