0

I am trying to save a data frame to an AWS redshift database using a postgres connection established using dplyr's src_postgres function. As you can see below there is a column in the data frame that is over 256 characters (some are even more). When I try to save this data frame to redshift I get the following error when I use dplyr's copy_to function. Is there anyway I can increase the limit of the number of characters so I can save this data frame onto AWS redshift, or does anyone else have any suggestions on how to save my data frame to redshift? Thank you.

> nchar(df$text)
[1] 598

> copy_to(conn_dplyr, df, TableName, temporary = FALSE)
Error in postgresqlExecStatement(conn, statement, ...) : 
RS-DBI driver: (could not Retrieve the result : ERROR:  value too long for    type character varying(256)
)
Kevin
  • 311
  • 5
  • 18

2 Answers2

0

This happens because Redshift does not support Text datatype. When you declare any column as Text, Redshift internally stores it as Varchar(255). Instead, change your column/ variable to varchar(1000) (length is upto you based on the expected values incoming).

Paladin
  • 570
  • 3
  • 13
  • Thanks Paladin! I understand now, but I am still confused on how I can change the varchar(1000)? Could you please help me with the syntax that will change this before I try to write it to the redshift DB? Thanks for your assistance. – Kevin Jan 14 '16 at 12:58
  • Not sure I understood what you are asking, but the two scenarios I can imagine 1. How to change the redshift column size: ALTER TABLE t1 ADD COLUMN new_column (___correct_column_definition___); UPDATE t1 SET new_column = column; ALTER TABLE t1 DROP COLUMN column; ALTER TABLE t1 RENAME COLUMN new_column TO column; http://stackoverflow.com/questions/17101918/alter-column-data-type-in-amazon-redshift 2. How to change it in your code: Not sure about dplyr but you must be having a cast/ convert function. Use that to convert your text into a varchar(1000) and then pass it to the Redshift function. – Paladin Jan 18 '16 at 05:19
0

I have had a very similar issue recently and found some sort of work around, not very elegant but it worked

getColumnClasses <- function(df) {
  return(data.frame(lapply(df[1, ], class)))
}

Then added a simple lookup function:

rClassToRedshiftType <- function(class) {
   switch(class,
     factor = {
       return('VARCHAR(256)')
     },
     character = {
       return('VARCHAR(65535)')
     },
     logical = {
       return('boolean')
     },
     numeric = {
       return('float')
     },
     integer = {
       return('int')
     }
   )
   return('timestamp')
}

getRedshiftTypesForDataFrame <- function(df) {
  return(
    apply(
      getColumnClasses(df), 2,
      FUN = rClassToRedshiftType
    )
  )
}

Finally, you can call copy_to using the parameter types

  dplyr::copy_to(
    connection,
    df, table.name,
    temporary = FALSE, types = getRedshiftTypesForDataFrame(df)
  )

Obviously, if you know the columns in advance you can define the types vector manually

Antobiotics
  • 75
  • 2
  • 5