4

Right now the only way to encrypt a Cassandra database at rest seems to be with their enterprise edition which costs thousands of dollars: How to use Cassandra with TDE (Transparent Data Encryption)

Another solution is to encrypt every value before it enters the database, but then the key will be stored somewhere on every server in plaintext and would be easy to find.

I understand they offer "free" use for certain companies, but this is not an option and I am not authorized to pay $2000/server. How do traditional companies encrypt their distributed databases?

Thanks for the advice

55jimbo
  • 3
  • 3
Code Wiget
  • 1,540
  • 1
  • 23
  • 35

3 Answers3

4

I took the approach of encrypting the data disk on AWS. I added a new volume to the instance and checked the option to encrypt the volume. Then I edited cassandra.yaml to point to the encrypted volume.

LHWizard
  • 2,121
  • 19
  • 30
  • 1
    To clarify, as someone who is relatively new to this : you encrypted the disk, and now every time there is a request to the disk, the disk unencrypts what you asked for on retrieval, and then reencrypts it on writes? I understand this is your method, but is this good practice? – Code Wiget Nov 01 '17 at 15:28
  • See [this PCI document](https://www.pcisecuritystandards.org/pdfs/pci_fs_data_storage.pdf). Here's a quote: "Some cryptography solutions encrypt specific fields of information stored in a database; others encrypt a singular file or even the entire disk where data is stored. If full-disk encryption is used, logical access must be managed independently of native operating system access control mechanisms..." – LHWizard Nov 01 '17 at 15:37
  • Encryption at rest is completely different from encrypting the disk. Disk or Volume level encryption protects from someone stealing the volume, but if they are already in the network it doesn’t give any further protection. Data is available in clear text and it’s just ebs volume that’s encrypted. To achieve data encryption at rest, In case of cloud like AWS one could use AWS parameter store as a way to store encrypted salt and encrypt data in flight and persist. In case of own datacenter there are KMIP (key mgmt) server which could be leverage to stores the keys of encryption. – dilsingi Nov 02 '17 at 14:45
  • @dilsingi, can you elaborate? "store encrypted salt" - what do you mean? Also, can you clarify this statement: "In case of own datacenter there are KMIP (key mgmt) server which could be leverage to stores the keys of encryption" - I am using cassandra in my own datacenter, not amazon's – Code Wiget Nov 04 '17 at 01:27
3

We have done similar requirement in one of our project. Basically, I made use of trigger feature in Cassandra with custom implementation to perform encryption. It seems to be working fine for us.

You can refer below docs on how to create trigger and sample implemention of ITrigger interface

https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTrigger.html

https://github.com/apache/cassandra/blob/2e5847d29bbdd45fd4fc73f071779d91326ceeba/examples/triggers/src/org/apache/cassandra/triggers/AuditTrigger.java

sayboras
  • 4,897
  • 2
  • 22
  • 40
  • 1
    So, if I'm getting this, you use a trigger on every write and read to encrypt/decrypt the data coming in/out? If so, where do you store the encryption key? This sounds like a great idea! – Code Wiget Nov 04 '17 at 01:28
  • 1
    we are using key which is not related to server details, hence it can be used across cluster inside the node. Key rotation and management are done in outside Cassandra itself (e.g. centralized server, HSM, etc) – sayboras Nov 05 '17 at 05:55
  • So is that a "yes" to my first question? My understanding is that you are using a trigger to encrypt/decrypt data, and that trigger goes into an HSM to encrypt the data? Is that correct? – Code Wiget Nov 05 '17 at 14:50
  • I will try to replicate this approach. Thank you for your help. One quick question before this is over - do you use a separate HSM per machine cassandra is running on or 1 central HSM? For production we will have between 40-60 servers in a "cluster", and what makes cassandra great is it scales well and removes a single point of failure - using 1 HSM could introduce a point of failure – Code Wiget Nov 05 '17 at 15:04
  • 1
    Understand your concern. Actually, HSM server is maintained by other team in my company. We might see it as `one` server, but it might be just a gateway to another cluster, which helps to avoid single point of failure. In development environment, we just use one common key across cassandra nodes. – sayboras Nov 05 '17 at 15:09
1

Encrypting before inserting is a good way. The keys will either be on each application or on each cassandra node. There isnt much difference really, either way you should use filesystem permissions to restrict access to key just the apps user. Theres steps to get more secure from there like requiring entering of passphrase on startup vs storing on disk, but it makes operational tasks horrific.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38