2

I have two use cases 1) traverse a database cursor 2) implement a framework for data scraping similar to Python's Scrapy. I'd simply like to write stuff using a for loop and yields.

My two questions are:

1) The important one: How do I implement generators for IO so that it retrieves stuff an element at a time, and it looks somewhat like a Python generator. The Scala Stream examples are focused on mathematical problems, and the "How do I implement a Python generator" ones don't go into enough detail :D

2) The signature of the generator will be implemented as functions returning only single entities as well as streams. How do I do this without having to explicitly define a return as a single-value Sequence?

Example (from my prototyping):

abstract class Expression
case class Visit(val url: String) extends Expression
case class SelectMultiple(selector: String, 
    List[(Element)=>ExpressionResult]) extends Expression

Visit will most likely return a single result, whereas SelectMultiple will spit out Urls, and extracted items to be written out to disk.

With:

abstract class ExpressionResult
case class ExtractedEntity(val entity: Entity) extends ExpressionResult
case class ExtractedRequest(val url: String) extends ExpressionResult
case class Continue() extends ExpressionResult
case class Ignore() extends ExpressionResult

I have :

case class Visit(url:String, andDo: ScrapeExpression) extends ScrapeExpression{
  override def execute : ExpressionResult = {
    val doc = Jsoup.connect(url).get()
    andDo.element = doc
    println( "visiting")
    andDo.execute
    Continue()
  }
}

But I am having to change the signature of execute to Seq[ExpressionResult] for SelectMultiple

P.s., the above prototype shows the case classes with executors, I will move the logic external to the expression tree after I figure out how to write the generators.

Possibly related

How to implement lazy sequence (iterable) in scala?

Stream vs Views vs Iterators

Functionally processing a database cursor in Scala

Treating an SQL ResultSet like a Scala Stream

Train of thought:

I have a feeling modelling an Iterator is probably my best bet in this case, but I have not looked into how they fit into for expressions yet. But will the traits and templates for iterators in library interop well with a for expression, and even if they do, will they cover the streaming use-case?

Should I be writing the io code in Java (shoot me now :~( ) ?

Community
  • 1
  • 1
Hassan Syed
  • 20,075
  • 11
  • 87
  • 171

0 Answers0