I'm saving really large data.frame (30 million rows) to PostgreSQL database from R and it kills my PC. As this is a result of calculations produced by dplyr, I'd mind to use some build in functionality of this package, but copy_to doesn't work for such huge tables. Any suggestions?
Asked
Active
Viewed 1,463 times
4
-
Loop through it by 1000 (more or less) at a time to insert it in batches? – DrColossos Jun 18 '14 at 10:24
-
http://stackoverflow.com/q/12206600/398670 – Craig Ringer Jun 18 '14 at 10:27
-
Thanks, splitting is the best solution. – kismsu Jun 27 '14 at 16:52
1 Answers
0
Can you copy the dataframe to a csv or tab delimited text file, then load that into PostgreSQL with the COPY FROM command [1]? That implements a bulk load approach which may perform faster.
In some cases, it may be possible to use an RScript to emit the data as a stream and pipe it directly into psql:
<RScript output tab delmited rows> | psql -c "COPY <tablename> (columnlist, ...) FROM STDIN WITH (FORMAT text)"
In some long running cases, I put | pv | in the middle to track progress (http://www.ivarch.com/programs/pv.shtml).
[1] http://www.postgresql.org/docs/current/interactive/sql-copy.html

kermatt
- 1,585
- 2
- 16
- 36