0

I have a query like the following:

select distinct 
  'sandbox_' || mp.email as "course_id",
  'Sandbox - ' || mp.first || ' ' || mp.last  as "short_name",
  'Sandbox - ' || mp.first || ' ' || mp.last  as "long_name",
  'instructor_sandboxes' as "account_id",
  'Sandbox' as "term_id",
  'active' as "status",
    null as "start_date",
    null as "end_date"
 from 
  table1 mp, 
  table2 bi,
  table3 term
 where user_type in ('A','B','C','D','E')
    and kerberos_name not like '%/%'
    and kerberos_name != '@staff@'
    and mp.banner_pidm=bi.pidm
    and term.startdate >= sysdate - 365 
    and term.enddate <= sysdate +365/2
    and term.term_code=bi.term_code
  ORDER BY "course_id"

I'm reading this query in at startup from a text file. I replace all line separators as so:

sql.replace(System.lineSeparator(), " ");

I then use the following:

DataFrame df = sqlContext.read().format("jdbc").options(db_properties).load();

where db_properties contains a field called "dbtable" with '(sql_statement)' as the value.

I'm getting the following error consistently when using the above to query an Oracle database:

Root Exception stack trace:
java.lang.RuntimeException: [3.15] failure: identifier expected

'sandbox_' || mp.email as "course_id",

          ^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
Gary Sharpe
  • 2,369
  • 8
  • 30
  • 51
  • `||` is not portable and is not supported in Spark or Hive. This query could work only if was passed directly as `dbtable` (http://stackoverflow.com/a/38729672/6910411) to be executed on the source, not when used in Spark SQL. – zero323 Apr 03 '17 at 23:11
  • I believe that's what I'm doing, though. This is what I said above: DataFrame df = sqlContext.read().format("jdbc").options(db_properties).load(); where db_properties contains a field called "dbtable" with '(sql_statement)' as the value. – Gary Sharpe Apr 03 '17 at 23:30
  • Can you add a full code? Note that `dbtable` has to be subquery. – zero323 Apr 04 '17 at 01:19

0 Answers0