1

I just designed a Pg database and need to choose a way of populating my DB with data, the data consists of txt and csv files but can generally be any type of file containing characters with delimiters, I'm programming in java in order to the data to have the same structure (there's lots of different kinds of files and I need to find what each column of the file represents so I can associate it with a column of my DB) I thought of two ways:

  • Convert the files into one same type of file (JSON) and then get the DB to regularly check the JSON file and import its content.

  • Directly connect to the database via JDBC send the strings to the DB (I still need to create a backup file containing what was inserted into the DB so in both cases there is a file created and written into).

Which would you go with time efficiency wise? I'm kinda tempted into using the first one as it would be easier to handle a json file in the DB. If you have any other suggestion that would also be welcome!

blank_sam
  • 23
  • 9
  • What do you mean by handle a json file in the db? Do you intend to store pure json documents in the database? While postgresql's JSONB field is pretty cool, this is not the ideal way to use postgresql. a nosql solution would be better. – e4c5 Apr 28 '16 at 14:53
  • Nope I meant import the data from the json I know about JSONB but in my case it would'nt be very useful – blank_sam Apr 28 '16 at 15:09

1 Answers1

1

JSON or CSV

If you have the liberty of converting your data either to CSV or JSON format, CSV is the one to choose. This is because you will then be able to use COPY FROM to bulk load large amounts of data at once into postgresql.

CSV is supported by COPY but JSON is not.

Directly inserting values.

This is the approach to take if you only need to insert a few (or maybe even a few thousand) records but not suited for large number of records because it will be slow.

If you choose this approach you can create the back up using COPY TO. However if you feel that you need to create the backup file with your java code. Choosing the format as CSV means you would be able to bulk load as discussed above.

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • The data can get pretty big sometimes so the first suggested approach is better. But is JSON that bad to use with COPY FROM ? The method indicated [here](http://stackoverflow.com/questions/33129526/loading-json-data-from-a-file-into-postgres) takes a bit more time than using a CSV but it's still safe right ? we just have to make sure each line in the json file will represent a row in the DB ( we basically bulk import the json file into a temporary table then extract the desired values and insert them in a corresponding column ) – blank_sam Apr 29 '16 at 09:56
  • Yes, in that particular example, the entire record is being loaded as a single column - a JSON field (now made obsolete by JSONB fields). That's not relational. If you want to store your data that way, it's entirely possible and safe but not optimal. You are not making use of any of the features of an RDBMS – e4c5 Apr 29 '16 at 10:08
  • So basically no other method to get the json data without going through a jsonb field or reading the whole json data character after character ? – blank_sam Apr 29 '16 at 10:52
  • Well JSONB isn't actually compulsory, you can store the data in text or varchar fields if you wanted. The difference is in the versatility or queries that you can execute on them and the speed. – e4c5 Apr 29 '16 at 10:55