2

I have just started Scala and came from Python.

I would like to read in a '|' delimited file and preserve the structure of the tables. Say I have a file that contains something like this:

1|2|3|4
5|6|7|8
9|10|11|12

I would like a function that would return a structure like this:

List(List(1, 2, 3, 4), List(5, 6, 7, 8), List(9, 10, 11, 12))

My code thus far (doesn't work because of type mismatch):

import scala.io.Source

def CSVReader(absPath:String, delimiter:String): List[List[Any]] = {
    println("Now reading... " + absPath)
    val MasterList = Source.fromFile(absPath).getLines().toList
    return MasterList
}

var ALHCorpus = "//Users//grant//devel//Scala-codes//ALHCorpusList"
var delimiter = "|"

var CSVContents = CSVReader(ALHCorpus, delimiter)
GrantD71
  • 1,787
  • 3
  • 19
  • 27

3 Answers3

6

I would just use a CSV library for this sort of thing. When I had to do something similar, I used scala-csv.

If you do not want to do that though, couldn't you simply split by your delimiter? I.e.,

import scala.io.Source

def CSVReader(absPath:String, delimiter:String): List[List[Any]] = {
    println("Now reading... " + absPath)
    val MasterList = Source.fromFile(absPath).getLines().toList map {
        // String#split() takes a regex, thus escaping.
        _.split("""\""" + delimiter).toList
    }
    return MasterList
}

var ALHCorpus = "//Users//grant//devel//Scala-codes//ALHCorpusList"
var delimiter = "|" // I changed your delimiter to pipe since that's what's in your sample data.

var CSVContents = CSVReader(ALHCorpus, delimiter)
Jack Leow
  • 21,945
  • 4
  • 50
  • 55
2

To start with I would try to let the type be inferred by not specifying a return type. Once you get the proper results then start constraining the return type and adjusting what CSVContents returns accordingly. This will fix the type error.

def CSVReader(absPath:String, delimiter:String) = { ...}

CSVContents then returns this:

scala> CSVContents
res0: List[String] = List(1|2|3|4, 5|6|7|8, 9|10|11|12)

One way to go from res0 to List[List[Any]] is with a regular expression to greedily extract digits. The regular expression for this is simply "\\d+".r in Scala.

val digitRegex = "\\d+".r
var CSVContents = CSVReader(ALHCorpus, delimiter).map(x => digitRegex.findAllIn(x).toList) 

Now CSVContents becomes this:

CSVContents: List[List[String]] = List(List(1, 2, 3, 4), List(5, 6, 7, 8), List(9, 10, 11, 12))
Brian
  • 20,195
  • 6
  • 34
  • 55
  • I appreciate the response, but the main issue is not about how to (or whether to) properly set the type of the return value. The main question is how does one perform the necessary "string and type gymnastics" to get the form that I want. – GrantD71 Sep 24 '13 at 23:42
  • @GrantD71 I've updated the answer using a regular expression which I think is a simpler approach than filtering and mapping as I said in an intial answer. – Brian Sep 24 '13 at 23:52
  • I didn't mean to specify a use case that only has integers. My actual problem has columns with many different types of data. – GrantD71 Sep 24 '13 at 23:59
  • @GrantD71 to make it work with different types in each column would require more work. Say you have `String`, `Int`, and `Double` types in the columns then you have to parse each column with the appropriate regex. In Scala, this can be done using `RegexParsers` parser combinators and defining a parser for each type. Then construct a `Parser` to parse one of these types at each column. Even with this the type of the resulting `List` would be `Any`. See http://stackoverflow.com/questions/5063022/use-scala-parser-combinator-to-parse-csv-files. – Brian Sep 25 '13 at 04:59
0

Assuming a Seq of tuples would be acceptable (and looking at your comments this is what you probably want) you can do this with product-collections. product-collections uses opencsv internally.

 scala> CsvParser[Int,Int,Int,Int].parseFile("x", delimiter="|")
 res2: org.catch22.collections.immutable.CollSeq4[Int,Int,Int,Int] = 
 CollSeq((1,2,3,4),
         (5,6,7,8),
         (9,10,11,12))
Mark Lister
  • 1,103
  • 6
  • 16