0

I am trying to connect to sql server with spark-jdbc, using JDBC_SESSION_INIT_STATEMENT to create a temporary table and then download data from the temporary table in the main query.

I have the following code:

//df is org.apache.spark.sql.DataFrameReader
val s = """select * into #tmp_table from ( SELECT op.ID,
          |       op.Date,
          |       op.DocumentID,
          |       op.Amount,
          |       op.AmountCurr,
          |       op.CurrencyID,
          |       operson.ObjectTypeId AS PersonOT,
          |       op.PersonID,
          |       ocontract.ObjectTypeId AS ContractOT,
          |       op.ContractID,
          |       op.DocNum,
          |       op.MomentCreate,
          |       op.ObjectTypeID,
          |       op.OwnerObjectID
          |FROM dbo.Operation op With (Index = IX_Operation_Date) --Без хинта временами уходит в скан всей таблицы
          |LEFT JOIN dbo.Object ocontract ON op.ContractID = ocontract.ID
          |LEFT JOIN dbo.Object operson ON op.PersonID = operson.ID
          |WHERE op.Date>='2019-01-01' and op.Date<'2020-01-01' AND 1=1
          |) wrap_for_single_connect
          |OPTION (LOOP JOIN, FORCE ORDER, MAX_GRANT_PERCENT=25)""".stripMargin

df
  .option(JDBCOptions.JDBC_SESSION_INIT_STATEMENT, s)
  .jdbc(
    jdbcUrl,
    "(select * from tempdb.#tmp_table) sub",
    connectionProps)

i get com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '#tmp_table'. And I have a feeling that JDBC_SESSION_INIT_STATEMENT is not working, because I deliberately tried to mess up the request and still got the Invalid object error.

How can I check if the request is working in JDBC_SESSION_INIT_STATEMENT?

Gumada Yaroslav
  • 115
  • 1
  • 10
  • 1
    Temporary tables only persist for as long as the scope that created them exists, and can only be accessed in that scope (or a "child" scope); very likely your `SELECT *` is executed in a separate scope, so cannot be accessed, and (likely) doesn't even exist anymore. – Thom A Jan 23 '23 at 09:58
  • @Larnu Yes, I thought about it, bat how do I generally understand that this request worked? – Gumada Yaroslav Jan 23 '23 at 10:03
  • Try something that's doomed to fail, such as `SELECT 1/0;` which should generate an error message, `Msg 8134 Level 16 State 1 Line 1 Divide by zero error encountered.` – AlwaysLearning Jan 23 '23 at 10:52

1 Answers1

1

One way to know whether your JDBCOptions.JDBC_SESSION_INIT_STATEMENT is executed is to enable INFO logging level for org.apache.spark.sql.execution.datasources.jdbc logger.

That should trigger this line and print out the following message to the logs:

Executing sessionInitStatement: [sql]

Given the comment I don't think you should use it to create a source table to load records from:

// This executes a generic SQL statement (or PL/SQL block) before reading
// the table/query via JDBC. Use this feature to initialize the database
// session environment, e.g. for optimizations and/or troubleshooting.

You should use dbtable or query parameter instead.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • We have to come up with something strange due to the fact that there is an OPTION in the request, it should always be last, but spark always wraps the request in select * from (your request) sub – Gumada Yaroslav Jan 23 '23 at 13:15
  • Also, one of the solutions to the problem is to look at the query history in sql server, [this thread can help](https://stackoverflow.com/questions/5299669/how-to-see-query-history-in-sql-server-management-studio) – Gumada Yaroslav Jan 23 '23 at 13:19