0

I have a Spark application written in Scala in which I have a Dataset[Event] where Event is a user-defined type, something like this:

case class Event(timestamp: Long, state: String, source: String)

which I am transforming to this:

case class TransformedEvent(timestamp: Long, state: String, source: String, is_finished: Boolean)

Basically, I am adding one field "is_finished" based on the other fields.

Example: is_finished = true if

state = "state1"
AND
source = "source1"

etc.

For a better explanation, here is the code:

val events: Dataset[Event] = getEvents()

// Here is the transformation

val transformedEvents: Dataset[TransformedEvent] = events.map(e => convert(e))

// where the convert function is something like this

def convert(event: Event): TransformedEvent = {

    val isFinished = if(event.state == "state1" && event.source = "source1")

    TransformedEvent(timestamp = event.timestamp,
                     state = event.state,
                     source = event.source,
                     is_finished = isFinished)

}

I am trying to figure out a way to make conditions like this event.state == "state1" && event.source = "source1" config-driven because I might have to add/delete/update these in the future and so do not want to make changes to the code and deploy each time such a scenario occurs.

Can anyone point me in the right direction?

Thanks in advance.

white-hawk-73
  • 856
  • 2
  • 10
  • 24
  • if you gonna change the condition or config variable you will have to redeploy it anyways? right and you can keep these list in application.conf and read it from there or else what is much better you can pass it as an argument in spark-submit command – Raman Mishra May 11 '20 at 21:16
  • Nope, I can put the config file in HDFS and modify only that file rather than deploying the application. I want to understand how to put this in config and fetch it from there and use it – white-hawk-73 May 11 '20 at 22:02
  • Does this answer your question? [Scala jar read external properties file](https://stackoverflow.com/questions/38583510/scala-jar-read-external-properties-file) – Raman Mishra May 11 '20 at 22:19
  • Could you please share a snippet of your code. Especially, the piece where you are transforming and needing the make things config driven? – venBigData May 12 '20 at 01:36
  • I know how to read a config file. What I am asking is to know how to make some of the expressions which I want to evaluate on my Spark Dataset config driven? – white-hawk-73 May 12 '20 at 13:43
  • @venBigData I have updated the question with the code. Basically, I am just looking to have boolean conditions in my code fetched from config and evaluated at run time. – white-hawk-73 May 12 '20 at 14:00

0 Answers0