I have about 10,000 csv files that I am trying to import into Postgres. The csv file size ranges from about 500MB to 1GB each. I'm importing each file into R first as a data frame because I have to do a bit of preprocessing on the raw data (like filtering out some rows and adding few additional columns). Then I'm using dbWriteTable to write the data into postgres.
status = try(dbWriteTable(con, name = "my_table", value = my.df, row.names = FALSE, append = TRUE,overwrite= FALSE))
With in the same R script I'm also writing some basic data (index of files) into other tables
qry = paste0("insert into file_list (file_name) values ('",filename,"')")
dbSendQuery(con,qry)
The R script usually runs well but stops working frequently with the following message
Error in postgresqlExecStatement(conn, statement, ...): RS-DBI driver:
(could not run statement: server closed the connection unexpectedly\n\tThis
probably means the server terminated abnormally\n\tbefore or while
processing the request.\n)\n
and rarely this too
Error in postgresqlCopyInDataframe(new.con, value) : RS-DBI driver:
(PQputCopyData failed: server closed the connection unexpectedly
This probably means the server terminated abnormally before or while
processing the request.)
Error in postgresqlNewConnection(drv, ...) : RS-DBI driver: (could not
connect my_db@my_address on dbname "mydb") Error in
!dbPreExists : invalid argument type
I'm not sure what's causing these messages and how to avoid these. Is this because of loosing connection with the server or because the files being too large. Any help is appreciated.
I'm using R 3.3.1 (64 bit), Windows 7 (64 bit with 8GB RAM) and Postgres 9.2.
Thanks.