I have two use cases 1) traverse a database cursor 2) implement a framework for data scraping similar to Python's Scrapy. I'd simply like to write stuff using a for loop and yields.
My two questions are:
1) The important one: How do I implement generators for IO so that it retrieves stuff an element at a time, and it looks somewhat like a Python generator. The Scala Stream examples are focused on mathematical problems, and the "How do I implement a Python generator" ones don't go into enough detail :D
2) The signature of the generator will be implemented as functions returning only single entities as well as streams. How do I do this without having to explicitly define a return as a single-value Sequence?
Example (from my prototyping):
abstract class Expression
case class Visit(val url: String) extends Expression
case class SelectMultiple(selector: String,
List[(Element)=>ExpressionResult]) extends Expression
Visit will most likely return a single result, whereas SelectMultiple
will spit out Urls, and extracted items to be written out to disk.
With:
abstract class ExpressionResult
case class ExtractedEntity(val entity: Entity) extends ExpressionResult
case class ExtractedRequest(val url: String) extends ExpressionResult
case class Continue() extends ExpressionResult
case class Ignore() extends ExpressionResult
I have :
case class Visit(url:String, andDo: ScrapeExpression) extends ScrapeExpression{
override def execute : ExpressionResult = {
val doc = Jsoup.connect(url).get()
andDo.element = doc
println( "visiting")
andDo.execute
Continue()
}
}
But I am having to change the signature of execute to Seq[ExpressionResult]
for SelectMultiple
P.s., the above prototype shows the case classes with executors, I will move the logic external to the expression tree after I figure out how to write the generators.
Possibly related
How to implement lazy sequence (iterable) in scala?
Functionally processing a database cursor in Scala
Treating an SQL ResultSet like a Scala Stream
Train of thought:
I have a feeling modelling an Iterator is probably my best bet in this case, but I have not looked into how they fit into for expressions yet. But will the traits and templates for iterators in library interop well with a for expression, and even if they do, will they cover the streaming use-case?
Should I be writing the io code in Java (shoot me now :~( ) ?