1

I am currently working on a project where in which I need to package a database along with a JAVA API. The database should be secured in a sense , no one should be able to access it apart from the API. The point is that the data in the database is intellectual property and should not be exposed.Here is the catch. The database contains approximately 5 million records(4-5 columns only). I need to query it based on indexed fields with aggregate functions as well. I am quite aware there are java embedded databases such as derby,hsql and their likes. But I seriously doubt their performance.I know the requirement sounds weird. But atleast its a start.

The API is meant to be accessed concurrently and needs to retrieve lookup values from this database.Is this really a good design or is there something wrong with the approach.Any architectureal suggestions are welcome

NewInfo: If an embeded database seems not promising, how about an embeded file to be put along with the API. Would this be feasible ?

Franklin
  • 1,790
  • 2
  • 14
  • 17

3 Answers3

2

If you're distributing the database, you're not going to prevent people from accessing it directly. That's just DRM, and as we know from long experience, DRM doesn't work.

The simplest way to keep the data private is to keep it on a server you control, and provide a network API. That also allows you to use the database of your choice.

If you do keep it on the client, I think you'll find the embedded databases perform better than you expect. Another good option is SQLite.

I'm not sure exactly what kind of concurrency you need. Do you envision multiple programs on a given client accessing the database simultaneously?

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • Seconding the suggestions from @Matthew Flaschen and @denisk - either store the db on a server you control and access it via a network connection, or encrypt the database and access an auth server to get access. – cofiem Mar 06 '11 at 08:01
  • Thanks for the info.I envision multiple programs using the API concurrently which mean concurrent access to the DB as well. Unfortunately,I cannot afford to host the data over the network due to security paranoia. The data has to be on the client side.Isnt there a technology that can address this? Or am I thinking wrong ? – Franklin Mar 06 '11 at 09:58
  • How could it be more secure to distribute all the data to every client? A secure API that provides only the required data on demand seems much better. There is no (*working*) technology to give a client data then tell him he can't use it. No amount of keys changes this fundamental fact. As far as concurrency, you might start with [this question](http://stackoverflow.com/questions/1438817/which-embedded-database-has-maximum-sql-compliance-and-concurrency-support). – Matthew Flaschen Mar 08 '11 at 21:28
  • @Mathew: My gut feeling keeps telling me that as soon as I ship the data to the client complete control is lost. Unfortunately the JAVA API(I am to build) is being compared with a C/C++ API where in which the data is stored in some binary DLL(I guess) and hence cannot be reverse engineered. Whats your take on this? Is it not possible to build a closed source API with JAVA? – Franklin Mar 13 '11 at 06:48
  • The data *can* be extracted from a DLL. If it hasn't already been reverse-engineered, that just means the developers are lucky. I still think the best you can do is a network API. – Matthew Flaschen Mar 14 '11 at 01:43
1

I would use H2 database with pure JDBC api. H2 is ultra fast and supports encryption. For security reasons, you could store the key to your encrypted database somewhere else (on some auth server?). And you probably won't be able to find something faster then pure JDBC.

  • How does the auth server solve anything? You still need some kind of authentication token on disk, and that basically becomes the decryption key. Once a single token is compromised, your database is open to the world. And I don't see why "pure JDBC" matters. JDBC is just an API. If you write a very fast C database, the time you spend in the JDBC wrapper will be insignificant next to querying time. – Matthew Flaschen Mar 06 '11 at 08:04
  • I do not claim this is the best solutions ever - it's obvious that the DB should be outside the system for reasonable security level. But to fulfil the requirements that may be OK. Pure JDBC would be faster than, for instance, ORM solution (for example, Hibernate) in most cases. We're not talking about C here. – Denys Kniazhev-Support Ukraine Mar 06 '11 at 10:01
0

I agree with Matthew that you simply cannot prevent the user from getting access to it. You also make it pretty easy if you are using a standard DB engine. You'll pretty much be limited to encrypting the data and then decrypting it into memory so that you can query it.

SQLite is a good embedded database, but does poorly with concurrency. HSQL, Derby or anything else should work pretty well if you can keep it in memory.

Kevin Peterson
  • 7,189
  • 5
  • 36
  • 43