I am trying to copy my dataframe results to Impala DB. But I am getting error while doing so.
library(RJDBC)
library(implyr)
drv <- JDBC("com.cloudera.impala.jdbc41.Driver","/User/ImpalaJDBC41.jar",identifier.quote="`")
conn <- dbConnect(drv, "username/password")
RJDBC::dbWriteTable(conn, 'default.segments', df)
I get below error.
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate ([Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Syntax error in line 1:
...ents (id DOUBLE PRECISION,eventdate VARCH...
^
Encountered: IDENTIFIER
Expected: BLOCK_SIZE, COMMENT, COMPRESSION, DEFAULT, ENCODING, INTERMEDIATE, LOCATION, NOT, NULL, PRIMARY, COMMA
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE default.segments (id DOUBLE
PRECISION,eventdate VARCHAR(255),segment INTEGER).)
Assuming something is wrong with datatypes. I have created table by specifying the datatypes and then inserting values to the DB.
RJDBC::dbSendUpdate(conn, paste("CREATE TABLE default.segments (id bigint,eventdate timestamp, segment bigint)",";"))
state1 <- paste0("INSERT INTO default.segments VALUES (", apply(df, 1, function(x) paste(x, collapse = ",")), ")" )
RJDBC::dbSendUpdate(conn, state1)
and this also gives me error with related to datatypes.
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate ([Cloudera]
[ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0,
SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000,
errorMessage:AnalysisException: Target table
'default.segments' is incompatible with source expressions.
Expression '2016 - 5 - 29' (type: BIGINT) is not compatible with column
'eventdate' (type: TIMESTAMP)
), Query: INSERT INTO default.segments VALUES ( 3,2016-
05-29, 79).)
below is the structure of my dataframe.
> str(df)
'data.frame': 19065 obs. of 3 variables:
$ id: num 3 3 3 69 102 102 102 102 102 102 ...
$ eventdate: Date, format: "2016-05-29" ...
$ segment: int 79 76 76 18 11 15 7 11 7 11 ...
In the last error it says Expression '2016 - 5 - 29' (type: BIGINT) is not compatible with column
'eventdate' (type: TIMESTAMP)
but my date column in dataframe is of Date
format. Then what could be the issue? Can someone please help.