0

I am writing a Java application which runs user-submitted Java code. I run each piece of user-submitted code in its own sandbox. This sandbox involves (among other things) running each code submission in a separate process, in a separate JVM (as I understand it, there is no other way to reliably control the memory and CPU usage of the submitted code, short of bytecode-level checks/analysis).

I want each sandboxed process to have access to a certain database. The database is large (around 10 GB, could be significantly larger in the future) and user-submitted code might make many billions of more-or-less random accesses to the database. So it is important that user-submitted code be able to access the database efficiently.

This is what I think I should do: load the database into memory from the main overseer process, and then give each sandboxed process read-only access to the loaded database. How can I do this? (Again, I am working in Java.)

Or do I have the wrong idea? Should I try a different approach?

  • Someone asked if the DB is read-only or read-write to the JVMs. My answer: The DB is just a big file stored on the hard drive. Then, I parse and load the whole thing into memory in the main process. – user3807539 Jul 07 '14 at 11:31
  • With your approach.. How will you ensure that the data is always `correct`?. i.e, what if you copy some data(record) into your memory and then someone writes to the same record? – TheLostMind Jul 07 '14 at 11:35
  • The database will not be modified during the execution of my program. When I say "database" what I really mean is "a snapshot of a database" which is stored as a file on my computer and will not be modified, until I get an updated snapshot of the database a few weeks later. – user3807539 Jul 07 '14 at 11:37
  • 1
    You could look up on how to setup a local caching mechanism on google.. – TheLostMind Jul 07 '14 at 11:38
  • Possible duplicate of http://stackoverflow.com/questions/502218/sandbox-against-malicious-code-in-a-java-application – Raedwald Jul 07 '14 at 11:51
  • If your database is going to be 10GB, and possibly *much* larger, how do you intend to hold all of that in memory? That does not seem reasonable. – Jamie Cockburn Jul 07 '14 at 12:22

2 Answers2

1

I don't think given the amount of data you are talking about (10GB or possibly much more) keeping it in memory is feasible.

I would recommend going with an SQLite database solution.

From each spawned process, you can open up the database in read-only mode, and access it through standard JDBC calls, or wrap it in some API of your own design.

This also has the advantage that you can move to a fully-fledged database solution if performance becomes an issue.

If you don't control the format of your data in the first, you can easily write an importer that updates the SQLite database from the new data file.

Community
  • 1
  • 1
Jamie Cockburn
  • 7,379
  • 1
  • 24
  • 37
0

Do not give them direct access to the database at all. Instead provide an API for the Java programs to use, with that API having no methods for altering the content of the data-base.

Raedwald
  • 46,613
  • 43
  • 151
  • 237