3

I have a file named "mappings.txt" of format :

k->v

To read this file into a Map I use :

val file = Source.fromFile("mappings.txt").getLines.filter(f => !f.trim.isEmpty)
val map = file.map(m2 => (m2.split("->")(0), m2.split("->")(1))).toMap

How to read the file into a Map where the values occur over multiple lines ? But some of mappings values are over multiple lines : such as :

k -> v \n 
       test 
       \n here 
k2 -> v2
blue-sky
  • 51,962
  • 152
  • 427
  • 752

2 Answers2

3

This seemed to work for me, given the very limited test data to work with.

val myMap = io.Source
              .fromFile("junk.txt")             // open the file
              .mkString                         // turn it into one long String
              .split("(?=\\n\\S+\\s*->)")       // a non-consuming split
              .map(_.trim.split("\\s*->\\s*"))  // split each element at "->"
              .map(arr => (arr(0)->arr(1)))     // from 2-element Array to tuple
              .toMap                            // from tuple to k->v pair

Result:

scala> myMap("k")
res0: String =
v \n
       test
       \n here

scala> myMap("k2")
res1: String = v2
jwvh
  • 50,871
  • 7
  • 38
  • 64
3

Below is the tail-recursive function that will group your input lines in specified way.

The idea is simple: process input line by line. When key->value pair is encountered, add it to the buffer (or accumulator). When line doesn't look like k->v pair, add this line to the value string of the last pair that is already present in the buffer.

val s =
  """k -> v \n
    |       test
    |       \n here
    |k2 -> v2
  """.stripMargin.split("\n").toList


def rec(input:List[String]):Map[String, String] = {
  val ARROW = "\\s*(.+?)\\s*->\\s*(.+?)\\s*".r

  def r0(in:List[String], accum:List[(String, List[String])]):List[(String, List[String])] = in match {
    // end of input, reverse line accumulators
    case Nil => accum.map{case (k, lines) => k -> lines.reverse}.reverse

    // key -> value   line encountered, adding new k->v pair to outer accumulator
    case ARROW(k, v) :: tail => r0(tail, (k, List(v)) :: accum)

    // line without key encountered, adding this line to previous  k->v pair in the accumulator
    case line :: tail => r0(tail, accum match {
      case (k, lines) :: accTail => (k, line :: lines) :: accTail
      case _ => accum  // if accum is empty and input doesn't have a key, ignore line
    })
  }

  r0(input, Nil).toMap.mapValues(_.mkString("\n"))
}

rec(s).foreach(println(_))

Result:

(k,v \n
   test
   \n here)
(k2,v2
)

Each line is processed exactly once, also each addition and modification of the buffer is O(1), so the whole process is O(N).

Also, please note, that you're reading file in a way that leaves resource opened. Please refer to this for details.

Aivean
  • 10,692
  • 25
  • 39