3

I have a CSV file, which contains a data matrix. The first column of this matrix contains a label and the other columns contain values, which are associated to the label (i.e. to the first column). Now I want to read this CSV file and put the data into a Map[String,Array[String]] in Scala. The key of the Map should be the label (this in the first column) and the Map values should be the other values (these one in the rest of the columns). To read the CSV file I use opencsv.

val isr: InputStreamReader = new InputStreamReader(getClass.getResourceAsStream("test.csv"))`  
val data: IndexedSeq[Array[String]] = new CSVReader(isr).readAll.asScala.toIndexedSeq`

Now I have all data in an IndexedSeq[Array[String]]. Can I use this functional way here or should I better chose an iterative way, because it can get complex to read all data at once? Well, now I need to create the Map from this IndexedSeq. Therefor I map the IndexedSeq to an IndexedSeq of Tupel[String,Array[String]] to seperate the label value from the rest of the values and then I create the Map from this.

val result: Map[String, Array(String) = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail)).toMap

This works for small examples but when I use it to read the content of my CSV file it throws a java.lang.RuntimeException. I also tried to create the map with a groupBy or to create several Maps (one for each line) and to reduce them afterwards to one big Map, but without success. I also read another post on stackoverflow and somebody assumes that toMap has a complexity of O(n²). I got this at the end of my StackTrace (whole Stacktrace is quite long).

Exception in thread "main" java.lang.reflect.InvocationTargetException      
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.runSingleTest(JavaSpecs2Runner.java:130)  
    at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.main(JavaSpecs2Runner.java:76)  
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
    at java.lang.reflect.Method.invoke(Method.java:601)  
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)  
    Caused by: java.lang.RuntimeException: can not create specification: com.test.MyClassSpec  
    at scala.sys.package$.error(package.scala:27)  
    at org.specs2.specification.SpecificationStructure$.createSpecification(BaseSpecification.scala:96)   
    at org.specs2.runner.ClassRunner.createSpecification(ClassRunner.scala:64)  
    at org.specs2.runner.ClassRunner.start(ClassRunner.scala:35)  
    at org.specs2.runner.ClassRunner.main(ClassRunner.scala:28)  
    at org.specs2.runner.NotifierRunner.main(NotifierRunner.scala:24)  
    ... 11 more  
    Process finished with exit code 1

Does anybody know another way to create a Map from the data in a CSV file?

Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
bam098
  • 31
  • 1
  • 1
  • 5
  • 1
    Can you say which one `java.lang.RuntimeException` is thrown precisely? (I mean there could be a message, or if there is no, can you provide stack trace?) – om-nom-nom Jun 21 '13 at 14:48
  • I added some of the stacktrace. I hope this helps. thanks. – bam098 Jun 21 '13 at 15:15
  • This seems to be a specs2 related issue, can you include your spec code as well. I'm guessing you are doing this map building in the constructor for the test spec and that is failing for some reason and stopping the spec from being created. – cmbaxter Jun 21 '13 at 15:28
  • Also, try this same code outside of a specs2 test specification and see if it works on its own or if you get a more specific error – cmbaxter Jun 21 '13 at 15:31
  • I'm not quite sure if I understand what you mean. What I basically do is that I have a class, which has this CSV reading and map creation in the constructor. Some methods of the class use this Map then to calculate a specific value. With the Spec class I test if the right value is calculated. Therefor I create an instance of the class in the spec and call the method. Hmm...should I maybe define a companion object, because this would be like a static class in java, so that the csv file is read only once? But I think this is not the reason for my actual problem. – bam098 Jun 21 '13 at 18:32
  • I tried to test it outside of the project in the REPL, but then I cannot create the inputstreamreader, but I need this because later it will be a jar and so I have to read the file as stream I think. – bam098 Jun 21 '13 at 18:33
  • Oh, by the way...I calculate this value several times (with different input). At works some iterations, but then I get the exception. – bam098 Jun 21 '13 at 18:45

3 Answers3

10

This worked for me:

import scala.io.Source
Source.fromFile("some_very_big_file").getLines.map(_.split(";")).count(_ => true)

The split Breaks up each line of the CSV file in simple records. The count is only there to check if the file is really read.

So now we can use this to read in a real CSV file (although I only tested it with a small file):

scala> val content=Source.fromFile("test.csv").getLines.map(_.split(";"))
content: Iterator[Array[java.lang.String]] = non-empty iterator

scala> val header=content.next
header: Array[java.lang.String] = Array(Elements, Duration)

scala> content.map(header.zip(_).toMap)
res40: Iterator[scala.collection.immutable.Map[java.lang.String,java.lang.String]] = non-empty iterator

This works quite well with simple CSV files. If you have more complex ones (e.g. entries spilt over several lines), you might have to use a more complex CSV parser (e.g. Apache Commons CSV. But usually sucha aperser will also give you some kind of iterator and you can use the same map(... zip ...) function on it.

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
stefan.schwetschke
  • 8,862
  • 1
  • 26
  • 30
  • I'm not so familiar with the several methods of reading a file, but does this also work if I have my class inside a jar library? I think I need to read the file as stream, but not sure. – bam098 Jun 21 '13 at 16:35
  • If you have your CSV inside a JAR library, you can use Class.getResourceAsStream (instead of Source.fromFile) to get an InputStream. The rest works as described above. – stefan.schwetschke Jun 26 '13 at 12:25
1

You could skip the intermediary List of tuple and just build the map directly like this:

val result: Map[String, Array[String]] = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail))(collection.breakOut)

Not sure if this will fix your issue though, but you did ask if there was another way to build the map. You can read more about collection.breakOut here:

Scala: List[Tuple3] to Map[String,String]

Community
  • 1
  • 1
cmbaxter
  • 35,283
  • 4
  • 86
  • 95
  • Hey. Thanks for your answer, but I already tried breakout (sorry I forgot to mention it) and it didn't work either. – bam098 Jun 21 '13 at 15:10
0

Not quite what you asked for but here's how to do it using my own dogfood:

val data = CsvParser[String,Int,Double].parseFile("sample.csv")
data: org.catch22.collections.immutable.CollSeq3[String,Int,Double] = 
CollSeq((Jan,10,22.33),
        (Feb,20,44.2),
        (Mar,25,55.1))

scala> val lookup=(data._1 zip data).toMap
lookup: scala.collection.immutable.Map[String,Product3[String,Int,Double]] = Map(Jan -> (Jan,10,22.33), Feb -> (Feb,20,44.2), Mar -> (Mar,25,55.1))

scala> lookup("Feb")
res0: Product3[String,Int,Double] = (Feb,20,44.2)

product-collections

Mark Lister
  • 1,103
  • 6
  • 16