-4

On social media, people send messages like "cjhdfsjnsnjd", "skhjfvhjdyg", "dpaopdjjjg" to get a laugh. But some people also add swearing to these random messages. I made a discord bot with python to find this, but the program doesn't search for the word as a whole in the sentence, so it can find it even if the person who integrated the swear word into the text of random letters places the letters in different places. But in these normal sentences it combines the scattered characters and recognizes it as a swear word. Is there any function or library in Python that can recognize if the text is random text or a normal sentence?

willy.js
  • 1
  • 3
  • 2
    How would you define "random text"? How would you define "normal sentence"? – matszwecja Aug 29 '23 at 14:38
  • Random text examples: "jhfsdfjdfhjg","wcokuuvnvdjdm","hfvbnd" and the like. Examples of normal sentences: "hello how are bro","whats up","i am fine thanks" and the like. – willy.js Aug 29 '23 at 14:42
  • There isn't really any way to achieve this, and even if there is. It's not going to be 100% accurate, and will block some normal words, and allow some "random text". – Thornily Aug 29 '23 at 14:46
  • 3
    What if the messages are polish, or czech ? This might appear random to you ? – Ant0ine64 Aug 29 '23 at 14:51
  • The typical way to measure randomness is entropy. Mind, _none_ of your text is very random -- anything typed on a keyboard by a human is a lot less random than something created by a proper random number generator -- but you might want to look at existing questions like [fastest way to compute entropy in Python](https://stackoverflow.com/questions/15450192/fastest-way-to-compute-entropy-in-python). One could also compare to a typical probability distribution for various languages, but that too will have false positives. – Charles Duffy Aug 29 '23 at 15:03
  • thanks, i will try. – willy.js Aug 29 '23 at 15:06
  • If you want a proper NLP toolkit, [NLTK](https://www.nltk.org/) is the (aging) gold standard, and [spacy](https://spacy.io/) a more modern competitor; either includes models trained on various languages and so can be used to guess which language (if any) text is written in -- but really, there's a lot that needs to be nailed down in defining your requirements before you have something narrow enough to be a topical question. – Charles Duffy Aug 29 '23 at 15:06

0 Answers0