0

Given a csv in the format below, what is the best way to load it into Scala as type Map[String, Array[String]], with the first key being the unique values for Col2, and the value Array[String]] as all co-occurring values of Col1?

a,1,
b,2,m
c,2,
d,1,
e,3,m
f,4,
g,2,
h,3,
I,1,
j,2,n
k,2,n
l,1,
m,5,
n,2,

I have tried to use the function below, but am getting errors trying to add to the Option type: += is not a member of Option[Array[String]]

In addition, I get overloaded method value ++ with alternatives: with regards to the line case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))

def parseCSV() : Map[String, Array[String]] = {
    var mapping = Map[String, Array[String]]()
    val lines = Source.fromFile("test.csv")
    for (line <- lines.getLines) {
      val linesplit = line.split(",")
      mapping.get(linesplit(2)) match {
        case Some(_) => mapping.get(linesplit(2)) += linesplit(1)
        case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))
      }
    }
    mapping
  }
}

I am hoping for a Map[String, Array[String]] like the following:

(2 -> Array["b","c","g","j", "k", "n"])
(3 -> Array["e","h"])
(4 -> Array["f"])
(5 -> Array["m"])
Chris
  • 99
  • 1
  • 1
  • 14
  • 1
    Possible duplicate of [How can I read a CSV file and put its content in a Map in Scala?](https://stackoverflow.com/questions/17238134/how-can-i-read-a-csv-file-and-put-its-content-in-a-map-in-scala) – Jeffrey Chung Sep 12 '19 at 10:08

3 Answers3

1

You can do the following: First - read the file to List[List[String]]:

val rows: List[List[String]] = using(io.Source.fromFile("test.csv")) { source =>
   source.getLines.toList map { line =>
   line.split(",").map(_.trim).toList
  }
}

Then, because the input has only 2 values per row, I filter the rows (rows with only one value I want to ignore)

val filteredRows = rows.filter(row => row.size > 1)

And the last step is to groupBy the first value (which is the second column - the index column is not returned from Source.fromFile):

filteredRows.groupBy(row => row.head).mapValues(_.map(_.last)))
Gal Naor
  • 2,397
  • 14
  • 17
1

This isn't complete, but it should give you an outline of how it might be done.

io.Source
  .fromFile("so.txt")    //open file
  .getLines()            //line by line
  .map(_.split(","))     //split on commas
  .toArray               //load into memory
  .groupMap(_(1))(_(0))  //Scala 2.13

//res0: Map[String,Array[String]] = Map(4 -> Array(f), 5 -> Array(m), 1 -> Array(a, d, I, l), 2 -> Array(b, c, g, j, k, n), 3 -> Array(e, h))

You'll notice that the file resource isn't closed, and it doesn't handle malformed input. I leave that for the diligent reader.

jwvh
  • 50,871
  • 7
  • 38
  • 64
1

For the above code mutable Map & ArrayBuffer should be used, as they could be mutated/updated later.

def parseCSV(): Map[String, Array[String]] = {
val mapping = scala.collection.mutable.Map[String, ArrayBuffer[String]]()
val lines = Source.fromFile("test.csv")
for (line <- lines.getLines) {
  val linesplit = line.split(",")
  val key = line.split(",")(1)
  val values = line.replace(s",$key", "").split(",")
  mapping.get(key) match {
    case Some(_) => mapping(linesplit(1)) ++= values
    case None =>
      val ab = ArrayBuffer[String]()
      mapping(linesplit(1)) = ab ++= values
  }
}
 mapping.map(v => (v._1, v._2.toArray)).toMap
}
hagarwal
  • 1,153
  • 11
  • 27