I am writing a website with a user chat function. At some point a user decided to use diatrics to draw all over everyone's screens.
In response I removed all text that was not in the ASCII character range. I'd like to re-enable UTF-8 but I don't know what to do about the combining marks ( UTF-8 characters that modify the character next to them ). As you can see from the example below, Stack Overflow doesn't handle for this problem.
Malicious input t̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀̀è̀̀̀̀̀̀̀x̀̀̀̀̀̀̀̀̀̀t̀̀̀̀̀̀̀̀̀̀̀̀̀
I feel like only 1 combining mark should be allowed but that seems like a really excessive thing for me to need to write and I don't know if there are any languages that take 2 or 3 combining characters. I imagine Korean uses them extensively.
This seems like it should be a solved problem but I can't any useful information on the topic.