I manage a PostgreSQL database and am working on a tool for users to access a subset of the database. The database has a number of columns, and in addition we use a huge number of hstore keys to store additional information specific to certain rows in the database. Basic example below
A B C hstore
"foo" 1 4 "Fruit"=>"apple", "Pet"=>"dog", "Country"=>"Norway"
"bar" 4 6 "Pet"=>"cat", "Country"=>"Suriname", "Number"=>"5"
"foobar" 2 8
"baz" 3 1 "Fruit"=>"apple", "Name"=>"David"
The data is routinely exported to a CSV file like this:
COPY tableName TO '/filepath/file.csv' DELIMITER ',' CSV HEADER;
I read this into a Pandas dataframe in Python like this:
import pandas as pd
df = pd.read_csv('/filepath/file.csv')
I then access a subset of the data. This subset should have a common set of hstore keys in most, but not necessarily all rows.
I would like to create a separate column for each of the hstore keys. Where a key does not exist for a row, the cell should be left empty, or filled with a NULL or NAN value, whatever is easiest. What is the most effective way to do this?