check if a name seems "human"?

Question

I have an online RPG game which I'm taking seriously. Lately I've been having problem with users making bogus characters with bogus names, just a bunch of different letters. Like Ghytjrhfsdjfnsdms, Yiiiedawdmnwe, Hhhhhhhhhhejejekk. I force them to change names but it's becoming too much. What can I do about this?

Could I somehow check so at least you can't use more than 2 of the same letter beside each other?? And also maybe if it contains vowels

Which languages are you supporting? English-only? Because if you support a language you don't speak, then how will you know whether a name is a name in an unfamiliar language, or just bogus? — John Saunders, Jul 15 '10 at 18:33
Ghytjrhfsdjfnsdms = Troll, Yiiiedawdmnwe = Elf, Hhhhhhhhhhejejekk = Goblin — Gordon, Jul 15 '10 at 20:05
>> you can't use more than 2 of the same letter beside each other - Problematic with perfectly good names like Allan, Abbie, Phillip, etc. — Mark Baker, Jul 15 '10 at 20:18
@Mark: Those would be fine since he said >2 characters next to each other, not >=2 — Falmarri, Jul 15 '10 at 20:24
You should try a Naive Bayes classifier similar to the one used to filter spam --- it's easy to implement and test. — Jacob, Jul 16 '10 at 00:47

Unicron · Answer 1 · 2010-07-15T18:42:44.063

11

I would recommend concentrating your energy on building a user interface that makes it brain-dead easy to list all new names to an administrator, and a big fat "force to rename" mechanism that minimizes the admin's workload, rather than trying to define the incredibly complex and varied rules that make a name (and program a regular expression to match them!).

Update - one thing comes to mind, though: Second Life used to allow you to freely specify a first name (maybe they check against a database of first names, I don't know) and then gives you a selection of a few hundred pre-defined last names to choose from. For an online RPG, that may already be enough.

edited Jul 15 '10 at 18:42

answered Jul 15 '10 at 18:36

Unicron

7,275
1
26
19

1

@Daniel 'yar' Rosenstark, I don't get such remarks. I mean, only if people simply answer the question being asked, is *that* a true answer to a question? I really hope not. I mean, if someone asks how to build a house with just a hammer, should one try to help this person on his/her way with just the hammer, or should one answer that it might not be a good idea to use only a hammer and suggest other tools as well? I sure hope it's the latter. – Bart Kiers Jul 15 '10 at 18:48
Adding to this, the main problem with other methods is false-positives, but you could use an other method to sort by "most likely to be fake". – Brendan Long Jul 15 '10 at 18:48
@Bart K. thanks. :) But I don't think @Daniel was attacking the answer, quite the contrary. And strictly speaking, my answer *is* arguably not quite what the OP asked for - even though we do think it's for the better that it isn't. – Unicron Jul 15 '10 at 19:25
@Bart K., I was being facetious, mostly. I also was one of the first upvoters of @Unicron's answer (totally unverifiable, but true :)). SOMETIMES, however (obviously not the OP's case), we are confined to a narrow solution space, but you're right. The answer's update is good too. – Dan Rosenstark Jul 15 '10 at 19:31
@Unicron, no, I didn't mean that he attacked your answer. I've just seen it happen quite a few times: someone getting an answer that did not address the actual question 100% and then getting a reply that it wasn't really an answer (which is non-sense, IMO). – Bart Kiers Jul 15 '10 at 19:33
@Daniel, yeah, sorry, I probably came over a bit harsh. It's probably because I've seen the *"Not an answer"* without the part *"but a good answer nonetheless"* and finally decided to give a reply (which I haven't done in the past...). :) – Bart Kiers Jul 15 '10 at 19:37
2

... and I finally wanted to use my *house-building-analogy* , of course. :) – Bart Kiers Jul 15 '10 at 19:38
@Bart K. no worries, we're all trying to use as many cool analogies as possible where applicable. – Dan Rosenstark Jul 15 '10 at 22:56

score 6 · Answer 2 · answered Jul 15 '10 at 18:38

6

You could use a metaphone implementation and then look for "unnatural" patterns:

http://www.php.net/manual/en/function.metaphone.php

This is the PHP function for metaphone string generation. You pass in a string and it returns the phonetic representation of the text. You could, in theory, pass a large number of "human" names and then store a database of valid combinations of phonemes. To test a questionable name, just see if the combinations of phonemes are in the database.

Hope this helps!

answered Jul 15 '10 at 18:38

mattbasta

13,492
9
47
68

This seems closer to what the OP was looking for. An algorithm has already been documented and implemented: http://www.sil.org/computing/lascruces.html – Kilanash Jul 15 '10 at 19:44
That's sounds good, however isn't that somehow related with spelling correction in PHP? Correct me if I am wrong. – tisuchi Dec 28 '17 at 03:43

score 4 · Answer 3 · answered Jul 15 '10 at 18:46

4

Would limiting the amount of consonants or vowels in a row, and preventing repeating help? As a regex:

if(preg_match('/[bcdfghjklmnpqrtsvwxyz]{4}|[aeiou]{4}|([a-z])\1{2}/i',$name)){
    //reject
}

Possibly use iconv with ASCII//TRANSLIT if you allow accentuated characters.

answered Jul 15 '10 at 18:46

Wrikken

69,272
8
97
136

I cannot downvote my own post, but to my mind this seems like bad solution 11 years down the line. Don't make these kinds of assumptions about names. – Wrikken Mar 18 '21 at 17:02

score 3 · Answer 4 · answered Jul 15 '10 at 18:33

3

What if you would use the Google Search API to see if the name returns any results?

answered Jul 15 '10 at 18:33

Matthew J Morrison

4,343
3
28
45

1

That gets back to a name seeming "human" - rather than a specific language. – Matthew J Morrison Jul 15 '10 at 18:36
6

clever, but not trustable. – Capt Otis Jul 15 '10 at 18:37
1

This seems like a sensible idea, if only to highlight the most ridiculous names in an admin UI – Chris Johnson Jul 15 '10 at 18:47
3

@Kenny: oh no, I'm trapped in recursion; the fourth result in that google search is this page! – Andy E Jul 15 '10 at 23:15
This won't work... Look at Kenny's example... I mean, "fffffffff" returns a bunch of pages. – Peter Ajtai Jul 16 '10 at 01:15

score 3 · Answer 5 · answered Jul 15 '10 at 18:41

3

I say take @Unicron's approach, of easy admin rejection, but on each rejection, add the name to a database of banned names. You might be able to use this data to detect specific attacks generation large numbers of users based on patterns. Will of course be very difficult to detect one-offs.

answered Jul 15 '10 at 18:41

sparkey0

1,641
1
12
14

Good idea storing away precedents! – Unicron Jul 15 '10 at 19:22

score 2 · Answer 6 · answered Jul 15 '10 at 18:37

I had this issue as well. An easy way to solve it is to force user names to validate against a database of world-wide names. Essentially you have a database on the backend with a few hundred thousand first and last names for both genders, and make their name match.

With a little bit of searching on google, you can find many name databases.

score 2 · Answer 7 · answered Jul 15 '10 at 18:37

Could I somehow check so at least you cant use more than 2 of the same letter beside each other?? and also maybe if it contains vowels

If you just want this, you can do:

preg_match('/(.)\\1\\1/i', $name);

This will return 1 if anything appears three times in a row or more.

score 1 · Answer 8 · answered Jul 15 '10 at 18:33

1

This link might help. You might also be able to plug it through a (possibly modified) speech synthesiser engine and analyse how much trouble it's having generating the speech, without actually generating it.

answered Jul 15 '10 at 18:33

Chris Dennett

22,412
8
58
84

score 1 · Answer 9 · answered Jul 15 '10 at 23:48

You should try implementing a modified version of a Naive Bayes spam filter. For example, in normal spam detection you calculate the probability of a word being spam and use individual word probabilities to determine if the whole message is spam.

Similarly, you could download a word list, and compute the probability that a pair of letters belongs to a real word.

E.g., create a 26x26 table say, T. Let the 5th row represent the letter e and let entry T(5,1) be the number of times ea appeared in your word list. Once you're done counting, divide each element in each row with the sum of the row so that T(5,1) is now the percentage of times ea appears in your word list in a pair of letter starting with e.

Now, you can use the individual pair probability (e.g. in Jimy that would be {Ji,im,iy} to check whether Jimy is an acceptable name or not. You'll probably have to determine the right probability to threshold at, but try it out --- it's not that hard to implement.

score 0 · Answer 10 · answered Jul 15 '10 at 18:49

What do you think about delegating the responsibility of creating users to a third party source (like Facebook, Twitter, OpenId...)?

Doing that will not solve your problem, but it will be more work for a user to create additional accounts - which (assuming that the users are lazy, since most are) should discourage the creation of additional "dummy" users.

score -3 · Answer 11 · answered Jul 15 '10 at 18:35

-3

It seems as though you are going to need a fairly complex preg function. I don't want to take the time to write one for you, as you will learn more writing it yourself, but I will help along the way if you post some attempts.

http://php.net/manual/en/function.preg-match.php

answered Jul 15 '10 at 18:35

Capt Otis

1,250
1
12
18

1

Good luck with that. Whether it's code or a regular expression it's still going to be impossible not to have false positives. – wadesworld Jul 15 '10 at 18:39
3

@Wade Williams - is "impossible not to have false positives" a triple negative? – Matthew J Morrison Jul 15 '10 at 18:43
Yeah good point. But almost no solution is going to be perfect here. – Capt Otis Jul 15 '10 at 19:08

check if a name seems "human"?

11 Answers11

Linked