13

I have a table in a PostgreSQL database that has a BIGSERIAL auto-incrementing primary key. Recreate it using:

CREATE TABLE foo
(
  "Id" bigserial PRIMARY KEY,
  "SomeData" text NOT NULL
);

I want to append some data to this table from R via the RPostgreSQL package. In R, the data doesn't include the Id column because I want the database to generate those value.

dfr <- data.frame(SomeData = letters)

Here's the code I used to try and write the data:

library(RPostgreSQL)
conn <- dbConnect(
  "PostgreSQL", 
  user     = "yourname", 
  password = "your password",
  dbname   = "test"
)
dbWriteTable(conn, "foo", dfr, append = TRUE, row.names = FALSE)
dbDisconnect(conn)

Unfortunately, dbWriteTable throws an error:

## Error in postgresqlgetResult(new.con) : 
##   RS-DBI driver: (could not Retrieve the result : ERROR:  invalid input syntax for integer: "a"
## CONTEXT:  COPY foo, line 1, column Id: "a"
## )

The error message isn't completely clear, but I interpret this as R trying to pass the contents of the SomeData column to the first column in the database (which is Id).

How should I be passing the data to PostgreSQL so that the Id column is auto-generated?

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • 1
    I don't believe [this patch](https://groups.google.com/forum/#!msg/rpostgresql-dev/hIilIO_MtYk/vNAnj5x2JHsJ) (long thread, apologies, but it speaks to your issue) ever made it into the RPostgreSQL distribution. It seems that `dbWriteTable` is issuing a `COPY` but there's no way to specify field names (which is causing the issue). There just isn't sufficient intelligence in `dbWriteTable` to see there's a `SERIAL` field. `dbWriteTable2` in `caroline` has a hack-ish workaround (by using it's own `id` field for tables). I'd prbly go with making a version that uses `COPY` with field names. – hrbrmstr Oct 19 '14 at 12:25

2 Answers2

8

From the thread in hrbrmstr's comment, I found a hack to make this work.

In the postgresqlWriteTable in the RPostgreSQL package, you need to replace the line

sql4 <- paste("COPY", postgresqlTableRef(name), "FROM STDIN")

with

sql4 <- paste(
  "COPY ", 
  postgresqlTableRef(name), 
  "(", 
  paste(postgresqlQuoteId(names(value)), collapse = ","), 
  ") FROM STDIN"
)

Note that the quoting of variables (not included in the original hack) is necessary to pass case-sensitive column names.

Here's a script to do that:

body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
  'postgresqlTableRef(name), "FROM STDIN")', 
  'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")', 
  body_lines,
  fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • This is excellent, worked perfectly for me. I honestly don't know why it wasn't able to do this to begin with. – JaredL Mar 12 '15 at 03:25
  • Hi Richie - if I understand correctly, will the script apply that change to the RPostgreSQL package automatically? I executed your script but I am still getting the same, if not very similar error. I might have a different use case... – jgozal Mar 03 '16 at 04:13
  • If you type `RPostgreSQL::postgresqlWriteTable`, do you get the new version of the function or the original? If you have a new problem, ask a new question. – Richie Cotton Mar 06 '16 at 11:51
  • @RichieCotton I get the old/default one! as if the script hadn't affected the function. The `sql4` variable is still `paste("COPY", postgresqlTableRef(name), "FROM STDIN")` Why do you think your script didn't affect the function after I executed it? and how would you suggest I make sure that the change is done? – jgozal Mar 08 '16 at 03:22
  • 2
    It might be easier for you to download the package source, edit the R functions, then install that modified version. – Richie Cotton Mar 08 '16 at 04:59
  • was thinking on doing that, I'm trying with loading a workspace where the function is modified now but if that doesn't work I'll do it directly from the source code – jgozal Mar 08 '16 at 06:20
  • downloaded the package source and changed the source code of the function. It seemed to have solved the issue I had before since the error stopped appearing but now I get another one `Error in postgresqlgetResult(new.con): RS-DBI driver: (could not Retrieve the result : ERROR: invalid input syntax for type boolean: .....` I might make another different question for this one. @RichieCotton thank you so much for your help! despite this happened to you 2 years ago :P – jgozal Mar 08 '16 at 07:18
  • @RichieCotton when I type `RPostgreSQL::postgresqlWriteTable` it shows me the new version of the function, but I get the following error... `Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "female_view_new3" FROM STDIN` – Gaurav May 31 '16 at 12:19
  • @Gaurav I think this problem is worth a new question. It's too obscure to understand and solve with a comment. – Richie Cotton Jun 05 '16 at 04:40
  • @RichieCotton Have posted a new question here... http://stackoverflow.com/questions/37651470/how-to-write-a-table-in-postgresql-from-r – Gaurav Jun 06 '16 at 07:12
  • @Jumpy89 hi ! how did you edit the function? I tried using `edit(RPostgreSQL::postgresqlWriteTable)` and changed the lines. But it does not save . I've downloaded the package but am having a difficult time finding the function among all the c files. Can you help? – vagabond Jul 12 '16 at 21:33
  • @vagabond Use `fixInNamespace` to change the function from within R. Or if you want to edit the R file to create a custom version of the package, look in the "R" directory. – Richie Cotton Jul 14 '16 at 04:39
  • @RichieCotton brilliant ! thank you . . . my alternate fix to this ordeal was to run the R code in an ipython notebook, push the dataframe to pandas and then write it to db using python `tosql` but i guess it won't be necessary now! – vagabond Jul 14 '16 at 16:12
  • @RichieCotton is there a way to permanently commit these changes to the function? when I restart R, the function reverts to the original code. I'm looking in the R directory, but cannot find the function ! – vagabond Jul 14 '16 at 17:15
  • @vagabond In the current version, `postgresqlWriteTable` is defined on line 596 of `PostgreSQLSupport.R`. Use the "find in files" feature of your favourite text editor to locate this sort of thing. – Richie Cotton Jul 16 '16 at 07:57
  • found it using finder. can't find it after the tar.gz is installed on linux. the same `/usr/local/lib/R/site-library/RPostgreSQL/R` does not contain `PostgreSQLSuport.R`:-( – vagabond Jul 18 '16 at 01:17
  • @RichieCotton I created a new question here: http://stackoverflow.com/questions/38427757/making-change-to-function-in-r-package-and-installing-on-ubuntu – vagabond Jul 18 '16 at 02:28
0

I struggled with an issue very similar to this today, and stumbled across this thread as I tried out different approaches. As of this writing (02/12/2018), it looks like the patch recommended above has been implemented into the latest version of RPostgreSQL::postgresqlWriteTable, but I still kept getting an error indicating that the primary key R assigned to my new rows was duplicated in the source data table.

I ultimately implemented a workaround generating an incrementing primary key in R to append to my inserted data to update the source table in my postgreSQL Db. For my purposes, I only needed to insert one record into my table at a time and I can't imagine this is an optimal solution for inserting a batch of records requiring a serially incremented primary key. Predictably, an error of "table my_table exists in database: aborting assignTable" was thrown when I omitted the 'append=TRUE' from my script; however this option did not automatically assign an incrementing primary key as I had hoped, even with the code patch described above.

drv <- dbDriver("PostgreSQL")
localdb <- dbConnect(drv, dbname= 'MyDatabase',
                      host= 'localhost',
                    port = 5432,
                    user = 'postgres',
                    password= 'MyPassword')

KeyPlusOne <- sum(dbGetQuery(localdb, "SELECT count(*) FROM my_table"),1)
NewRecord <- t(c(KeyPlusOne, 'Var1','Var2','Var3','Var4'))
NewRecord <- as.data.frame(NewRecord)
NewRecord <- setNames(KeyPlusOne, c("PK","VarName1","VarName2","VarName3","VarName4"))

postgresqlWriteTable(localdb, "my_table", NewRecord, append=TRUE, row.names=FALSE)
Paul Sochacki
  • 433
  • 6
  • 8
  • Your last NewRecord line with setNames() throws an error for me: ```Error in names(object) <- nm : 'names' attribute [5] must be the same length as the vector [1]``` – MokeEire May 01 '20 at 22:32