16

.

.

.

Example: กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ (or any "zalgo" text)

I haven't been able to quite figure out any way to check for these. I'm making a kind of antispam and I don't see the need to keep these as they can lag users and is just generally spam.

What I'm trying to do is

if (getMessage().getRawContent().contains(combined character).delete();

If anyone knows a simple way to check for combined chars please post!

If you are confused on what I am asking I can explain it further and show more examples if needed.

Morse
  • 8,258
  • 7
  • 39
  • 64
Miss Cartoon
  • 201
  • 2
  • 8
  • Is it only my browser or is the question intended to be that way http://imgur.com/a/zNR17 ? Weird AF o.O – Jorge Campos Apr 18 '17 at 01:55
  • 5
    @JorgeCampos I’m pretty sure it’s intentional. The question is asking how to detect abuse of combining characters. – VGR Apr 18 '17 at 01:56
  • 1
    @Jorge: mine too, and let's say: what a damn nice hack! Waiting for the promised **more examples**. Yes, we need'em! – statosdotcom Apr 18 '17 at 01:57
  • @statosdotcom Yeah thought the same lol – Jorge Campos Apr 18 '17 at 01:57
  • 3
    Have you guys seen [this infamous answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)? – VGR Apr 18 '17 at 02:00
  • This is unrelated. However, I'm wondering if there is similar question but for Python too? I'd like to know the solution! – titipata Apr 18 '17 at 02:51
  • @statosdotcom You can mess around with this little here tool (forgot who made it but I didn't make it) http://jsbin.com/erajer/edit?html You will have to find the html ids for the combining characters but when you do you can make some insanely long lines of death. They used to work as a YouTube name aswell and one persons name took up half the comments section. – Miss Cartoon Apr 18 '17 at 04:33
  • @Miss Cartoon Thank for the pleasure you made introducing me to this real anarchistic manifestos. Plus one you. Haven't even heard about. It's an interesting and very nice this world we're living on. May the God of Cartoons (Bob Crumb?) bless you. – statosdotcom Apr 18 '17 at 04:43

1 Answers1

15

There are plenty of cases where one or two consecutive combining characters is perfectly valid text. I would look for four or more of them:

if (getMessage().getRawContent().matches(".*\\p{Mn}{4}.*"))
VGR
  • 40,506
  • 4
  • 48
  • 63