1

More details: I am new to Scala and Akka. I am trying to build a concurrent system that does this essentially-

  • Read a CSV file
  • Parse it into groups
  • And then load into table.

The file cannot be split into smaller files and hence I am going with a normal standard serialized read. I pass the info to a Masterwriter(an actor). I dynamically create n number of actors called writers and pass them chunks of this info. Each writer is now actually responsible for reading the data, categorize them and then insert into appropriate table.

My doubt is that when two writers are writing concurrently onto the table, will it lead to a race condition. Also, how else could this problem be modeled in a better way to increase speed. Any help in any direction would be really useful. Thanks

Nanda
  • 11
  • 2
  • 1
    what kind of table do you use? – Alexei Kaigorodov Jun 04 '17 at 09:31
  • A relational table. Sybase. – Nanda Jun 04 '17 at 11:08
  • I would have one actor per table, and only allow that actor to write into that table. You would now have n readers and m writers (where m is the number of tables). That has the advantage that you (well, the actor system really) would be in control of any concurrency issues, rather than the Sybase client library. You could conceivably keep a transaction open within that writer which would probably (note probably) speed things up. – Phasmid Jun 04 '17 at 17:20
  • Try this stack solution: https://stackoverflow.com/questions/36400152/how-are-reactive-streams-used-in-slick-for-inserting-data – Ramón J Romero y Vigil Jun 04 '17 at 17:41
  • Thanks @Phasmid that is a good idea. But the requirements are such that I handle all 3 tables within the same actor. This is because the entry into second table depends on the first and third depends on second. So, it would be better to group all 3 into one actor. Is there a way given this constraint?? – Nanda Jun 06 '17 at 17:03

1 Answers1

0

Modelling the Data Access

I have found that the biggest key to designing this sort of task is to abstract away the database. You should treat any database updates as simple function that returns success or failure:

type UpdateResult = Boolean

val UpdateSuccess : UpdateResult = true
val UpdateFailure : UpdateResult = false

type Data = ???

type Updater = (Data) => UpdateResult

This allows you to write an Updater that goes to an actual db or an test updater that always returns success:

val statement : Statement = ???

val dbUpdater : Updater = (data) => {
  statement.executeQuery(s"INSERT INTO ... ${data.toString}")
}

val testUpdater : Updater = _ => UpdateSuccess

Akka Stream Implementation

For this particular use case I recommend akka streams instead of raw Actors. A solution using the stream paradigm can be found here.

Akka Actor

An Actor solution is also possible:

val UpdateActor(updater : Updater) extends Actor {

  override def receive = {
    case data : Data => sender ! updater(data)
  }
}

The problem with Actors is that you'll have to write an Actor to read the file, other Actors to group the rows, and finally use the UpdateActor to send data to the db. You'll also have to wire all of those Actors together...

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125