I have 51 massive HDF5 tables, each with enough (well behaved) data that I cannot load even one of them completely into memory. To make life easier for the rest of my team I need to transfer this data into a PostgreSQL database (and delete the HDF5 tables). However, this is easier said than done, mainly because of these hurdles:
pandas.read_hdf()
still has a wonkychunksize
kwag: SO Question; Open github issuepandas.DataFrame.to_sql()
is monumentally slow and inefficient: Open github issue (see my post at the bottom of the issue page)- PostgreSQL does not have a native or third party data wrapper to deal with HDF5: PostgreSQL wiki article
- HDF5 ODBC driver is still nascent: HDF5 ODBC blog
Basically to go from HDF5 -> Pandas -> PostgreSQL, will require surmounting hurdles 1 and 2 by extensive monkey patching. And there seems to be no direct way to go from HDF5 -> PostgreSQL directly. Unless I am missing something.
Perhaps one of you fine users can hint at something I am missing, some patchwork you created to surmount a similar issue that would help my cause, or any suggestions or advice...