In my post here I describe, in detail, a technique that allows you to use the index with LIKE
for fast %infix%
search, at the cost of some extra storage:
https://stackoverflow.com/a/22531268/543814
As long as the strings are relatively small, the storage requirement is generally acceptable.
According to Google, the average e-mail address is 25
characters long. This increases your required storage by a factor 12.5
on average, and gives you fast indexed search in return. (See my post for the calculations.)
From my perspective, if you are storing 10'000 e-mail addresses, you should be fine storing (the equivalent of) about 100'000 e-mail addresses, too. If this is what it takes to allow you to use an index, that seems like an acceptable trade-off. Often, disk space is cheap, while non-indexed searches are unaffordable.
If you choose to take this approach, I suggest that you limit the input length of e-mail addresses to 64
characters. Those rare (or attacker) e-mail addresses of such length will require up to 32
times the usual storage. This gives you:
- Protection against an attacker trying to flood your database, since these still aren't very impressive amounts of data.
- The expectation that most e-mail addresses are not of this length anyway.
If you consider 64
characters too harsh a requirement, use 255
instead, for a worst-case storage increase factor of 127.5
. Ridiculous? Possibly. Likely? No. Fast? Very.