6

I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.

 query <- sprintf("BEGIN;
                         CREATE TEMPORARY TABLE 
                         md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;

                         INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
                         LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;

                         UPDATE %s.meta_data_unlocalized
                         SET meta_data = md_updates.meta_data
                         FROM md_updates
                         WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
                         COMMIT;", md_values, schema, schema, schema, schema)

DBI::dbGetQuery(con,query)

The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.

I did some research and found quite some information:

bulk inserts

bulk inserts II

what causes large inserts to slow down

answers from @Erwin Brandstetter and @Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.

However, I struggled to implement another suggestion which sounded promising: COPY. The problem is I can't get it done from within R.

The following works for me:

sql <- sprintf('CREATE TABLE 
            md_updates(ts_key varchar, meta_data hstore);
            COPY md_updates FROM STDIN;')


 dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")

But I can't get it done without reading from a extra .csv file. So my questions are:

  • Is COPY really a promising approach here (over the multi-row INSERT I got?

  • Is there a way to use COPY from within R without writing data to a file. Data does fit in memory and since it's already in mem why write to disk?

I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.

Community
  • 1
  • 1
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207

1 Answers1

9

RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:

install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)

Where table foo has the same columns as dataframe df

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
  • thanks +1 this actually works. I been using the `RPostgreSQL` package for years but didn't search for such a function. It does exactly what I want. Will run a couple of benchmarks to see whether it really speeds things up. – Matt Bannert Apr 12 '17 at 10:29
  • 1
    thanks for your help. It's actually 6 times faster for larger `INSERT`s. – Matt Bannert Apr 12 '17 at 15:39
  • This link seems to be very helpful too to append more records on the postgres table - https://stackoverflow.com/questions/33634713/rpostgresql-import-dataframe-into-a-table – Joni Hoppen Oct 09 '21 at 23:24