You can use repartition(colname) or partitionBy() to make dynamic partitioning of your dataset.
For example if your dataset is like as follows
create table sensor_data (
sensor_id bigint,
temp float,
region_id string,
state string,
country string
) partition by ( day string)
If you want to do region wise some calculation for a particular day,
val sensor_data = spark.sql("select * from sensor_data where day='2018-02-10')
val results = sensor_data.
repartition(col("region_id")).
mapPartitions( eventIter => {
processEvent(eventIter).iterator
})
case Event(sensor_id: String, country: String, max_temp: float)
def processEvent(evtIter: Iterator[Row]) : List[Event] = {
val maxTempEvents = ListBuffer[Event]()
while (evtIter.hasNext) {
val evt = evtIter.next()
// do your calculation and add results to maxTempEvents list
}
maxTempEvents
}
Hope this helps.
Thanks
Ravi