I've always found bulk-copy to be the best, external to R. The insert can be significantly faster, and your overhead is (1) writing to file, and (2) the shorter run-time.
Setup for this test:
- win10 (2004)
- docker
- postgres:11 container, using port 35432 on localhost, simple authentication
- a
psql
binary in the host OS (where R is running); should be easy with linux, with windows I grabbed the "zip" (not installer) file from https://www.postgresql.org/download/windows/ and extracted what I needed
- I'm using
data.table::fwrite
to save the file because it's fast; in this case write.table
and write.csv
are still much faster than using DBI::dbWriteTable
, but with your size of data you might prefer something quick
DBI::dbCreateTable(con2, "mt", mtcars)
DBI::dbGetQuery(con2, "select count(*) as n from mt")
# n
# 1 0
z1000 <- data.table::rbindlist(replicate(1000, mtcars, simplify=F))
nrow(z1000)
# [1] 32000
system.time({
DBI::dbWriteTable(con2, "mt", z1000, create = FALSE, append = TRUE)
})
# user system elapsed
# 1.56 1.09 30.90
system.time({
data.table::fwrite(z1000, "mt.csv")
URI <- sprintf("postgresql://%s:%s@%s:%s", "postgres", "mysecretpassword", "127.0.0.1", "35432")
system(
sprintf("psql.exe -U postgres -c \"\\copy %s (%s) from %s (FORMAT CSV, HEADER)\" %s",
"mt", paste(colnames(z1000), collapse = ","),
sQuote("mt.csv"), URI)
)
})
# COPY 32000
# user system elapsed
# 0.05 0.00 0.19
DBI::dbGetQuery(con2, "select count(*) as n from mt")
# n
# 1 64000
While this is a lot smaller than your data (32K rows, 11 columns, 1.3MB of data), a speedup from 30 seconds to less than 1 second cannot be ignored.
Side note: there is also a sizable difference between dbAppendTable
(slow) and dbWriteTable
. Comparing psql
and those two functions:
z100 <- rbindlist(replicate(100, mtcars, simplify=F))
system.time({
data.table::fwrite(z100, "mt.csv")
URI <- sprintf("postgresql://%s:%s@%s:%s", "postgres", "mysecretpassword", "127.0.0.1", "35432")
system(
sprintf("/Users/r2/bin/psql -U postgres -c \"\\copy %s (%s) from %s (FORMAT CSV, HEADER)\" %s",
"mt", paste(colnames(z100), collapse = ","),
sQuote("mt.csv"), URI)
)
})
# COPY 3200
# user system elapsed
# 0.0 0.0 0.1
system.time({
DBI::dbWriteTable(con2, "mt", z100, create = FALSE, append = TRUE)
})
# user system elapsed
# 0.17 0.04 2.95
system.time({
DBI::dbAppendTable(con2, "mt", z100, create = FALSE, append = TRUE)
})
# user system elapsed
# 0.74 0.33 23.59
(I don't want to time dbAppendTable
with z1000
above ...)
(For kicks, I ran it with replicate(10000, ...)
and ran the psql
and dbWriteTable
tests again, and they took 2 seconds and 372 seconds, respectively. Your choice :-) ... now I have over 650,000 rows of mtcars
... hrmph ... drop table mt
...