2

I'm completely new to managing data using databases so I hope my question is not too stupid but I did not find anything related using the title keywords...

I want to setup a SQL database to store computation results; these are performed using a python library. My idea was to use a python ORM like SQLAlchemy or peewee to store the results to a database. However, the computations are done by several people on many different machines, including some that are not directly connected to internet: it is therefore impossible to simply use one common database.

What would be useful to me would be a way of saving the data in the ORM's format to be able to read it again directly once I transfer the data to a machine where the main database can be accessed.

To summarize, I want to do:

  • On the 1st machine: Python data -> ORM object -> ORM.fileformat
  • After transfer on a connected machine: ORM.fileformat -> ORM object -> SQL database

Would anyone know if existing ORMs offer that kind of feature?

Silmathoron
  • 1,781
  • 1
  • 15
  • 32
  • You could store it in any format and then transfer it to a main database when you connect, all you want is a consistent format so the transfer will be easy then all you need is a single database on your local machine. – Padraic Cunningham Mar 19 '16 at 17:28
  • Yes, my question was whether existing ORMs already provided this functionality in order to save me the trouble of writing and parsing my files... which is a pain – Silmathoron Mar 19 '16 at 17:35
  • 1
    You either setup a common db which you say you cannot, you set one up on each machine and do an update on a main db when you connect which you commented on the answer as maybe not being too easy on your cluster or you persist the data in some native python container and do the update when you connect. Outside of those three options I don't see what else you can do. Writing a general function once the format was consistent would be a one off task and should be pretty straight forward, http://dev.mysql.com/doc/refman/5.7/en/load-data.html – Padraic Cunningham Mar 19 '16 at 17:46
  • 1
    Ok, thank you for the link, I'll have a look! – Silmathoron Mar 19 '16 at 17:58

2 Answers2

0

Is there a reason why some of the machine cannot be connected to the internet?

If you really can't, what I would do is setup a database and the Python app on each machine where data is collected/generated. Have each machine use the app to store into its own local database and then later you can create a dump of each database from each machine and import those results into one database.

Not the ideal solution but it will work.

  • Yep: for safety reasons as we are running some of our calculations on supercomputers (which we don't own) Thank you for your answer, I'll try that if it turns out that there is no other... though I don't know if getting a mysql database to run on the cluster will be easy ^^ – Silmathoron Mar 19 '16 at 17:21
  • 1
    Well, if you cannot install a database on those machines, then you can dump the data into CSV file. I do believe MySQL supports importing data into tables using CSV files. –  Mar 19 '16 at 18:17
0

Ok, thanks to MAhsan's and Padraic's answers I was able to find the how this can be done: the CSV format is indeed easy to use for import/export from a database.

Here are examples for SQLAlchemy (import 1, import 2, and export) and peewee

Community
  • 1
  • 1
Silmathoron
  • 1,781
  • 1
  • 15
  • 32
  • 1
    Since you're using Python on both computers you can always use `pickle` serialization, no need for csv. https://docs.python.org/3.6/library/pickle.html – MGP Feb 25 '17 at 13:56
  • did not think about that, I'll give it a try, thanks! – Silmathoron Feb 27 '17 at 07:55