Regex for keyboard mashing

Question

When signing up for new accounts, web apps often ask for the answer to a 'security question', i.e. Dog's name, etc.

I'd like to go through our database and look for instances where users just mashed the keyboard instead of providing a legitimate answer - this is a high indicator of an abusive/fraudulent account.

"Mother's maiden name?" lakdsjflkaj

Any suggestions as to how I should go about doing this?

Note: I'm not ONLY using regular expressions on these 'security question answers'

The 'answers' can be:

Selected from a db using a few basic sql regexes
Analyzed as many times as necessary using python regexes
Compared/pruned/scored as needed

This is a technical question, not a philosophical one ;-)

Thanks!

Fraudulent? Maybe just a legitimate way to answer stupid questions that don't necessarily increase security. — innaM, Jul 21 '09 at 14:54
Seriously, I've seen names in Balkan languages that were less readable and pronounceable than "lakdsjflkaj". — Michael Myers, Jul 21 '09 at 14:55
Or, to put it another way, there are a lot of languages that just look like keyboard mashing to me. It all depends on your point of view. — Michael Myers, Jul 21 '09 at 15:01
I once had a colleage whose name consisted of 5 consecutive letters on a german keyboard. — starblue, Jul 21 '09 at 15:10
@Marcel You should have said you're trying to write a PawSense clone, that way nobody would give you trouble. http://www.bitboost.com/pawsense/ — itsadok, Jul 22 '09 at 05:10

tanascius · Answer 1 · 2012-07-12T12:31:05.763

40

I would not do this - in my opinion these questions weaken the security, so as a user I always try to provide another semi-password as an answer - for you it would like mashed. Well, it is mashed, but that is exactly what I want to do.

Btw. I am not sure about the fact, that you can query the answers. Since they overcome your password protection they should be handled like passwords = stored as a hash!

Edit:
When I read this article I instantly remembered this questions ;-)

edited Jul 12 '12 at 12:31

answered Jul 21 '09 at 14:55

tanascius

53,078
22
114
136

+1, i use a seperate password for my secret question answers, also they definatly should be stored as hashes – Petey B Jul 21 '09 at 15:01
The app's db already has this info stored. I'm looking for slick ways of finding the people that mashed the keyboard. A semi-password would not look 'mashed', since there is some thought put into it... mashed has lots of home-row letters like 'asdf' and so on. Hence the challenge. ;-) – Marcel Chastain Jul 21 '09 at 15:06
1

Your db should not be queryable for this kind of mashing when you store the answers properly. And how do you seperate real passwords from keyboard sequences? A password is random and can look like asd&2!Mpe ... while a lazy user could type asdokm ... just by using two hands ... where is the difference that can be found by a program? – tanascius Jul 21 '09 at 15:15
If you really want to force security on your users, letting them choose their own password isn't the right way ... – Jochen Ritzel Jul 21 '09 at 15:17
This isn't an issue over passwords guys, but about the answer to a 'security question'. – Marcel Chastain Jul 21 '09 at 15:21
1

And the answer to the security question should also be encrypted. – Nick Lewis Jul 21 '09 at 15:25
Gah. Threadjacked. Looking for **technical solutions** here guys. An answer with the word 'should' in it (i.e. "you should do this instead") is not answering the technical question. – Marcel Chastain Jul 21 '09 at 15:53
2

I always mash the keyboard for these; "security questions" are insecure. I'd be quite irate if a site then told me "no, you *have* to give me your mother's maiden name", or if I was accused of abuse because I understand basic security. (I don't care if you're looking for technical solutions; when you ask for something that sounds like an inherently bad idea piled on top of an even worse idea, you'll just have to put up with people telling you so.) – Glenn Maynard Jul 21 '09 at 18:13
I mash the keyboard too... but that doesn't help me write a regex. Some of the less popular comments have great advice that applies to the challenge. – Marcel Chastain Jul 21 '09 at 19:47
p'dfsvpjkoFBjopDFipwdvwvbjopdrbW meant to do it – Thomas Dignan Jul 08 '11 at 02:31

nik · Answer 2 · 2009-07-21T15:23:32.730

13

The whole approach of security questions is quite flawed.

I have always found people put security answers weaker than the passwords they use.
Security questions are just one more link in a security chain -- the weaker link!

IMO, a better way to go would be to allow the user to request a new-password sent to their registered e-mail id. This has two advantages.

the brute-force attempt has to locate and break the e-mail service first (and, you will never help them there -- keep the registration e-mail id very protected)
- the user of your service will always get an indication when someone tries a brute-force (they get a mail saying they tried to regenerate their password)

If you MUST have secret questions, let them trigger a re-generated (never send the user's password, regenerate a temporary, preferably one-time forced) password dispatch to the e-mail id they registered with -- and, do not show that at all.

Another trick is to make the secret question ITSELF their registered e-mail id.
If they put it right, you send a re-generated temporary password to that e-mail id.

edited Jul 21 '09 at 15:23

answered Jul 21 '09 at 15:16

nik

13,254
3
41
57

Well yeah, I haven't discussed what exactly happens after they press submit.
Your ideas are sound. In our app, they have to answer a security question in order for a new password to be sent to their registered email id, exactly as you said.
This challenge is all about *detecting mashing patterns with regex + code*, but I think we started a debate about security questions as a whole ;-)
Thanks again for your input. – Marcel Chastain Jul 21 '09 at 15:30
Well, guess that means no html in comments huh. – Marcel Chastain Jul 21 '09 at 15:31
1

That's worse yet. You're just making security-conscious users, who never input real answers to security questions, unable to recover their password. – Glenn Maynard Jul 21 '09 at 18:15
I always just put the password in when I *have* to put something in. – Brad Gilbert Jul 22 '09 at 03:32

score 6 · Answer 3 · answered Jul 21 '09 at 14:56

6

There's no way to do this with a regex. Actually, I can't think of a reasonable way to do this at all -- where would you draw the line between suspicious and unsuspicious? I, for once, often answer the security questions with an obfuscated answer. After all, my mother's maiden name isn't the hardest thing to find out.

answered Jul 21 '09 at 14:56

balpha

50,022
18
110
131

obfuscated != mashed ... mashed is a fairly distinct distribution of letter frequency and spacing, esp w/lots of home row or adjacent keys. I'm not looking for 100% accuracy here, of course. I have close to a million of these 'security answers' stored, and I want to find the really suspicious ones. – Marcel Chastain Jul 21 '09 at 15:09

score 6 · Accepted Answer · answered Jul 21 '09 at 14:59

6

You're probably better off analyzing n-gram distribution, similar to language detection.

This code is an example of language detection using trigrams. My guess is the keyboard smashing trigrams are pretty unique and don't appear in normal language.

answered Jul 21 '09 at 14:59

itsadok

28,822
30
126
171

Thanks for your input. This is a step in the right direction for me. More ideas like this, please..! -- mC – Marcel Chastain Jul 21 '09 at 15:32
Wow, this is fantastic..! -- mC – Marcel Chastain Jul 21 '09 at 15:40

score 4 · Answer 5 · answered Jul 21 '09 at 14:57

4

If you can find a list of letter-pair probabilities in English, you could construct an approximate probability for the word not being a "real" English word, using the least possible pairs and pairs that are not in the list. Unfortunately, if you have names or other "non-words" then you can't force them to be English words.

answered Jul 21 '09 at 14:57

Tim Sylvester

22,897
2
80
94

Hmm, I like this. I'll check up on this one. Thanks for your feedback. – Marcel Chastain Jul 21 '09 at 15:10
This is similar to the comment about 'analyzing n-gram distribution'. Great stuff, thanks again -- mC – Marcel Chastain Jul 21 '09 at 15:47
Not all users are native English speakers. People could very easily choose to put their mother's maiden name in its native Chinese, or to put "ワンコ" as their first pet's name. – Glenn Maynard Jul 21 '09 at 18:17

score 4 · Answer 6 · answered Jul 21 '09 at 15:22

4

Maybe you could check for an abundance of consonants. So for example, in your example lakdsjflkaj there are 2 vowels ( a ) and 9 consonants. Usually the probability of hitting a vowel when randomly pressing keys is much lower than the one of hitting a consonant.

answered Jul 21 '09 at 15:22

Geo

93,257
117
344
520

Interesting approach. I think this would work well with some of the other tests I have in store. Thanks! -- mC – Marcel Chastain Jul 21 '09 at 15:32

score 3 · Answer 7 · answered Feb 11 '17 at 22:05

Dejunk is a Ruby library from which you can draw inspiration. It implements a few of the suggestions in other answers. It considers input to be keyboard mashing if the input:

Contains character bigrams that are unlikely to appear in real text, but that are close together on a keyboard. (The library includes a list of such bigrams.)
Starts with an unexpected punctuation mark.
Has too many very short words.
Has no vowels.
Has characters that are repeated an unreasonable number of times.

score 2 · Answer 8 · answered Jul 21 '09 at 14:59

2

You could check for a capital letter at the start.... that will get you some false positives for sure.

A quick google gave me this, you could compare each against a name in that list.

Obviously only works for the security question you stated.

Have you also seen this:

Anatomy of the twitter attack

I'm going to think hard next time i implement a security question.

answered Jul 21 '09 at 14:59

Question Mark

3,557
1
25
30

Wow, that's a great article. Thanks for that! Yeah, if this was my app, I'd rethink using this feature. On the other hand, for the purposes of detecting fraudulent accounts, it might help me, given that the rest of the info (name, CC#, address, IP country, etc) is all legit. Just making lemonade over here ;-) – Marcel Chastain Jul 21 '09 at 15:18

score 2 · Answer 9 · answered Jul 21 '09 at 15:02

2

If your question is ever something related to a real, human name, this is impossible. Consider Asian names typed with roman characters; they may very well trip whatever filter you come up with, but are still perfectly legitimate.

answered Jul 21 '09 at 15:02

Alex S

25,241
18
52
63

Huh? I don't understand how Gupta, Singh, Zhang, Nguyen, Tran, Watanabe etc are going to trip up any reasonable filter, especially if the n-gram statistics are based on surname lists that relate to the customer base -- if you have enough customers, use your customers surnames to get the statistics! In any case, you have to be prepared for false positives, and you don't send out the armed police on the basis of 1 indicator and no human review. – John Machin Jul 25 '09 at 03:20

score 0 · Answer 10 · answered Jul 21 '09 at 14:57

0

You could look for patterns that don't make sense phonetically. Such as:

'q' not followed by a 'u'.

asdf

qwer

zxcv

asdlasd

Basically, try mashing on your own keyboard, see what you get, and plug that in your filter. Also plug in various grammatical rules. However, since it's names you're dealing with, you'll always get 'that guy' with the weird name who will cause a false positive.

answered Jul 21 '09 at 14:57

samoz

56,849
55
141
195

As for users of the Dvorak keyboard layout, or French users with an AZERTY keyboard, or Russian users typing in Cyrillic... – NickFitz Jul 21 '09 at 15:03
Thanks for your input. I'll incorporate this into the final version. – Marcel Chastain Jul 21 '09 at 15:12

score 0 · Answer 11 · answered Jul 21 '09 at 15:52

0

Instead of regular expressions, why not just compare with a list of known good values? For example, compare Mother's maiden name with census data, or pet name with any of the pet name lists you can find online. For a much simpler version of this, just do a Google search for whatever is entered. Legitimate names should have plenty of results, while keyboard mashing should result in very few if any.

As with any other method, you will still need to handle false positives.

answered Jul 21 '09 at 15:52

Kevin

8,353
3
37
33

That's an interesting approach, thanks for the input. We have several different security questions, and honestly, I'm just looking for a few hundred highly suspicious accounts that all have mashed 'security question' answers. Thanks again -- mC – Marcel Chastain Jul 21 '09 at 15:55
This is all completely ridiculous. If people want to mash the keyboard, let them - you can't be a cop all the time. – Dal Hundal Mar 03 '10 at 08:18

Regex for keyboard mashing

11 Answers11

The whole approach of security questions is quite flawed.