I'm about to start working with data that is ~500 GB in size. I'd like to be able to access small components of the data at any given time with Python. I'm considering in using PyTables or MongoDB with PyMongo (or Hadoop - thanks Drahkar). Are there other file structures/DBs that I should consider?
Some of the operations I'll be doing are computing distances from one point to another. Extracting data based on indices from boolean tests and the like. The results may go online for a website, but at the moment it is intended to be only used on a desktop for analysis.
Cheers