I am using slick in Akka Streams to load large number of records (~2M) from the database (postgresql) and write them to an S3 File. However, I'm noticing that my code below works for records around ~50k but fails for anything over around 100k mark.
val allResults: Future[Seq[MyEntityImpl]] =
MyRepository.getAllRecordss()
val results: Future[MultipartUploadResult] = Source
.fromFuture(allResults)
.map(seek => seek.toList)
.mapConcat(identity)
.map(myEntity => myEntity.toPSV + "\n")
.map(s => ByteString(s))
.runWith(s3Sink)
Below is a sample of how myEntity
looks like:
case class MyEntityImpl(partOne: MyPartOne, partTwo: MyPartTwo) {
def toPSV: String = myPartOne.toPSV + myPartTwo.toPSV
}
case class MyPartOne(field1: String, field2: String) {
def toPSV: String = {s"$field1|"+s"$field2"}
}
case class MyPartOne(field1: String, field2: String) {
def toPSV: String = {s"$field1|"+s"$field2"}
}
I am looking for a way to do this in a more reactive way so that it doesn't run out of memory.