5

I am using attr_encrypted to encrypt some of my model fields, and I use Tire with Elasticsearch for full text searching. I use just a simple search form. here is part of my model:

class Student < ActiveRecord::Base

  include Tire::Model::Search
  include Tire::Model::Callbacks

  attr_accessible :name, :surname
  attr_encrypted :name,             :key => 'f98gd9regre9gr9gre9gerh'
  attr_encrypted :surname,          :key => 'f98gd9regre9gr9gre9gerh'

  def self.search(params)
    tire.search(load: true) do
      query { string Student.encrypt_name(params[:search]) } if params[:search].present?
    end  
  end
end

So, for example, if I have the name "John" in the database, when I search for "John" the query is encrypted (Student.encrypt_name(params[:search])) before querying the database, and the result is returned. Elasticsearch allows wildcarded searching, for example if I search "Joh*", should return the matched result, but encrypted keyword "Joh" is different from encrypted "John", and db returns no result. Any solutions on this would be appreciated.

Regards, Radoslav

2 Answers2

2

Short answer - full text search and client encryption are mutually exclusive at the current state of the art technology.

Longer answers:

  1. You can additionally store the cleartext the soundex of the name and compare by it. This requires compromise in both the functionality and security. Check what it is and judge by yourself.

  2. Store all possible partial matches (or at least some sensible subset of these) of the name encrypted in separate table and match by identity (possible with encrypted data). No go for me, but you can google for 'data hashing' and 'inverse index' if you feel adventurous. Note that this hurts security as well.

  3. There are theoretical results but I haven't found anything close to implementation.

Petar Donchev
  • 399
  • 4
  • 9
0

Another option if your data is relatively small is to cache the data in the server memory, and perform regex matching operations on the cached dataset. This might make sense if:

  1. you have ~1000 students per school
  2. your cache key is such that a given school can only search within its students
  3. you cache the minimal set of fields you need for searching without serializing the whole object

Of course, it could then be possible for a hacker to access your webserver memory and read the data. This could be partially mitigated by a well-designed cache flushing policy.

David Starkey
  • 1,840
  • 3
  • 32
  • 48