To apply some function to stream in trident storm we pass newly created instance to each
method which is called on stream like this:
stream.each(inputFields, new SomeFunc(), outputFields)
where SomeFunc
is a descendant of BaseFunc.
Suppose I want to have some state variable in SomeFunc
:
class SomeFunc extends BaseFunction {
var someState: String = _
override def execute(tuple: TridentTuple, collector: TridentCollector) = ???
}
If I set parallelism hint to some value greater than 1 for SomeFunc component will storm create multiple instances of SomeFunc
? Is accessing/updating someState in SomeFunc a thread safe operation? If instead of defining SomeClass as class I define it as an object, will smth be changed?
EDIT Ok, with help of user @Shaw in comments to his answer I learned that storm creates one instance of storm component(storm/bolt/function/aggregator etc.) per executor. The question is how does it do this? I want to know the mechanism of this behaviour