We load data from oracle into a dataset: like this.
val dataset = sqlContext.read.format("jdbc").options(Map(
"driver" -> applicationConfig.getString("oracle.driver"),
"url" -> applicationConfig.getString("oracle.url"),
"user" -> applicationConfig.getString("oracle.user"),
"password" -> applicationConfig.getString("oracle.password"),
"dbtable" -> query
)).load().as[CaseClass]
CaseClass looks like:
case class CaseClass (
RELNR: Long = null,
INS_CONTACTHIST_DATE: Timestamp = null,
CONTACTDATETIME: Timestamp = null,
CONTACTSTATUSID: Long = null,
...
I want to create a new DataSet[CaseClass]
import sqlContext.implicits._
val acc = sqlContext.createDataset[CaseClass](Seq())
and fill it in a few iterations with filtered data from dataset:
val possibilities = dataset.filter(c => predicate(c))
acc.union(possibilities)
This fails with an error: unresolved operator 'Union;
From SO I learned that this has to do with incompatible datasets, and doing a printSchema()
on both datasets confirms that some columntypes are incompatible:
Oracle:
|-- RELNR: decimal(10,0) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(19,0) (nullable = true)
empty dataset:
|-- RELNR: long (nullable = true)
|-- INS_CONTACTHIST_DATE: timestamp (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: long (nullable = true)
How can I make the union work? or how can I force the population by the sqlcontext.read(..)
to use the CaseClass' property types?