23

I want make an authentication system for my app along the lines of SUAS, except instead of using SHA256 for hashing passwords I'd like to use bcrypt or scrypt. Unfortunately both py-bcrypt and scrypt for python use native c, which is unsupported by GAE.

Any way around this?

agf
  • 171,228
  • 44
  • 289
  • 238
Justin Goguen
  • 273
  • 2
  • 6
  • 1
    What is the "standard way of doing it"? And I'm not using Google's system for a number of reasons, most prominent of which being when you sign out of my app, you are signed out of all other Google services. – Justin Goguen Aug 11 '11 at 14:11
  • Why not use OpenID, then? Your users probably don't want to have to create yet another user account. – Nick Johnson Aug 12 '11 at 00:17

2 Answers2

24

Scrypt and BCrypt are both extremely processor-intensive (by design). Because of this, I very much doubt any pure-python implementation is going to be fast enough to be secure - that is, be able to hash using a sufficient number of rounds within a reasonable amount of time.

I can personally attest to this, I've tried writing a pure-python BCrypt, and it was way too slow to be useful. The docs for the pure-python bcrypt implementation mentioned in another answer note this exact flaw - to beware of using it for actual security, it's rounds must be set too low. The only time such implementations will be fast enough is under pypy, which is not the situation you're faced with.


What you want to go with is something based on an available hash primitive like SHA-2. That way the heavy calculation bit will still be able to be written in C, even under GAE. I'd recommend something based on PBKDF2 or SHA-512-Crypt (note: this is not just a plain sha512 hash). The security of the algorithms is just as good, but pure-python implementations will be much more efficient, since they can leverage hashlib to do the heavy lifting.

The Passlib library might be useful in this case, it contains implementations of PBKDF2 and SHA-512-Crypt in pure python. (Disclaimer: I'm the author of that library). Another Python library with PBKDF2 support is Cryptacular.

Eli Collins
  • 8,375
  • 2
  • 34
  • 38
  • scrypt is based on a hash function, which means it'd be equally efficient as systems like PBKDF2 in pure-Python. – Nick Johnson Aug 12 '11 at 00:18
  • This package is wonderful, but whenever I try to import it in GAE I get the following error: `TypeError: object does not appear to be a crypt handler: ` – Justin Goguen Aug 12 '11 at 13:18
  • 2
    Found another module that has the same functionality and works with GAE: http://pypi.python.org/pypi/pbkdf2 – Justin Goguen Aug 12 '11 at 14:05
  • 1
    @JustinGoguen: Sorry about that bug, haven't actually done much testing on GAE. That alternative should certainly work (I think was the first pbkdf2 password hash for python). Though if you have the time, it'd be great if you could post the error and traceback on the [passlib issue tracker](http://code.google.com/p/passlib/issues), so I could look into fixing it. – Eli Collins Aug 12 '11 at 14:41
  • 1
    @NickJohnson: We don't just need a hash function, but one implemented in C and in stdlib (so GAE has the C extension). The underlying problem is that Python is relatively slow at high-volume bit-shuffling/arithmetic such as is needed by most crypto (not just hashes), so C extensions are needed for speed. Unfortunately, SCrypt is based on the [Salsa20](http://en.wikipedia.org/wiki/Salsa20) hash, and a customized version at that, so we can't install it under GAE. Also, SCrypt requires efficient storage & manipulation of a *large* number of hashes, which might be tricky under python. – Eli Collins Aug 12 '11 at 15:11
  • 1
    @EliCollins The SCrypt paper defines SCrypt as `scrypt(P, S, N, r, p, dkLen) = MFcryptHMAC SHA256,SMixr (P, S, N, p, dkLen)`. It definitely uses SHA256, which is implemented in C in Python; it also uses part of the Salsa20 cipher, but I'm not sure how much of the total runtime is spent executing that. The Python-slowdown for an implementation of SCrypt still ought to be less than that for BCrypt, since none of BCrypt can be implemented in C on standard Python. The sets of hashes SCrypt deals with are large for a GPU or FPGA, but should pose absolutely no trouble for Python. – Nick Johnson Nov 15 '11 at 23:34
  • 2
    @NickJohnson Actually, SHA256 is a minor part of [SCrypt](http://www.tarsnap.com/scrypt/scrypt.pdf): it's only called `2*p` times at the start and end of scrypt()... it's contribution to the cost is negligible (and `p` is usually fixed at 1). Whereas the Salsa 20/8 core is the basis for the entire algorithm, and is called `4*N*r*p` times (where `N` is usually large, eg `2**12`). As well, SCrypt requires allocating and manipulating arrays of bytes `128*N*r` in size, and performing many bitwise operations on them... things Python does not do nearly as efficiently as C. _(continued...)_ – Eli Collins Nov 16 '11 at 04:15
  • 2
    @NickJohnson _(...continued)_ If Python had an efficient way to xor byte arrays, and a C-based Salsa 20/8 function, it might be possible; but despite all the approaches I've tried, my pure-python [implementation](http://code.google.com/p/passlib/source/browse/passlib/utils/_slow_scrypt.py?name=scrypt-dev) of SCrypt has been stuck orders of magnitude too slow. On the other hand, BCrypt uses a fixed number of 32-bit integers, making the resource requirements drastically less, so it's performance in pure Python _is_ better than SCrypt. But for similar reasons, it's still to slow to be secure. – Eli Collins Nov 16 '11 at 04:19
  • 1
    @EliCollins Is your SCrypt implementation slower than a pure-python bcrypt implementation? Again, memory usage is unlikely to be an issue for a Python SCrypt implementation. – Nick Johnson Nov 16 '11 at 04:29
  • 1
    @NickJohnson My apologies, you have a point... using rounds values from the SCrypt paper, I get: Pure Python BCrypt w/ 2**11 rounds - 27s (13ms/round); Pure Python SCrypt w/ 2**14 rounds - 69s (4ms/round). So while it is slower for exact values in the paper, I hadn't noticed until now how much faster the ms/round is for SCrypt... and you could probably get away with as few as 2**10 SCrypt rounds, still have better security than BCrypt at 2**11. Mind you, getting < 250ms response times from these implementations requires at most 2**6 rounds, which is too low to be secure IMHO :( – Eli Collins Nov 16 '11 at 06:40
  • 1
    @EliCollins I agree that doing this in Python is less than ideal. Hopefully we can do something about that in the not-too-distant future. :/ – Nick Johnson Nov 16 '11 at 22:47
8

This guy ported py-bcrypt to pure python so you can use it on GAE: https://github.com/erlichmen/py-bcrypt

Fábio Diniz
  • 10,077
  • 3
  • 38
  • 45
  • While technically true, with a reasonable number of rounds it is currently slow to actually use. – Jeremy Sep 12 '15 at 13:36