1

Below Scala class parses a file using JDOM and populates the values from the file into a Scala immutable Map. Using the + operator on the Map does not seem to have any effect as the Map is always zero.

import java.io.File
import org.jsoup.nodes.Document
import org.jsoup.Jsoup
import org.jsoup.select.Elements
import org.jsoup.nodes.Element
import scala.collection.immutable.TreeMap

class JdkElementDetail() {

  var fileLocation: String = _

  def this(fileLocation: String) = {
      this()
      this.fileLocation = fileLocation;
    }


  def parseFile : Map[String , String] = {

    val jdkElementsMap: Map[String, String] = new TreeMap[String , String];
    val input: File = new File(fileLocation);
    val doc: Document = Jsoup.parse(input, "UTF-8", "http://example.com/");
    val e: Elements = doc.getElementsByAttribute("href");

    val href: java.util.Iterator[Element] = e.iterator();
    while (href.hasNext()) {
      var objectName = href.next();
      var hrefValue = objectName.attr("href");
      var name = objectName.text();

      jdkElementsMap + name -> hrefValue
            println("size is "+jdkElementsMap.size)
    }

    jdkElementsMap
  }

}

println("size is "+jdkElementsMap.size) always prints "size is 0"

Why is the size always zero, am I not adding to the Map correctly?

Is the only fix for this to convert jdkElementsMap to a var and then use the following?

jdkElementsMap += name -> hrefValue

Removing the while loop here is my updated object:

package com.parse

import java.io.File
import org.jsoup.nodes.Document
import org.jsoup.Jsoup
import org.jsoup.select.Elements
import org.jsoup.nodes.Element
import scala.collection.immutable.TreeMap
import scala.collection.JavaConverters._

class JdkElementDetail() {

  var fileLocation: String = _

  def this(fileLocation: String) = {
      this()
      this.fileLocation = fileLocation;
    }


  def parseFile : Map[String , String] = {

    var jdkElementsMap: Map[String, String] = new TreeMap[String , String];
    val input: File = new File(fileLocation);
    val doc: Document = Jsoup.parse(input, "UTF-8", "http://example.com/");
    val elements: Elements = doc.getElementsByAttribute("href");

    val elementsScalaIterator = elements.iterator().asScala

    elementsScalaIterator.foreach {
      keyVal => {
          var hrefValue = keyVal.attr("href");
          var name = keyVal.text();
          println("size is "+jdkElementsMap.size)
          jdkElementsMap += name -> hrefValue
       }
    }
    jdkElementsMap
  }

}
blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • 1
    `x + z` has no effect on `x` when `x` is an immutable object; if it did have an effect it wouldn't be *immutable*! Use the return value, which is a *new* immutable map. –  Jan 30 '13 at 22:52
  • Well, not the "only" way, but it'll work. One alternative is recursion; preferably written TCO-friendly and verified with `@tailrec`. I'm sure someone will pipe up with a clever way to wrap the loop (hasNext/next) into a map (i.e. `for .. yield`) or mention something about monads :D –  Jan 30 '13 at 23:11
  • I would look at http://stackoverflow.com/a/2709031/166390 (mentions JavaConversions) and http://stackoverflow.com/questions/8301947/what-is-the-difference-between-javaconverters-and-javaconversions-in-scala (newer JavaConverters) - then you can get rid of that *icky* Java Iterator with the *icky* procedural loop and use a simple map operation why coming up with something "less scary" than other options :D –  Jan 30 '13 at 23:16

2 Answers2

7

Immutable data structures -- be they lists or maps -- are just that: immutable. You don't ever change them, you create new data structures based on changes to the old ones.

If you do val x = jdkElementsMap + (name -> hrefValue), then you'll get the new map on x, while jdkElementsMap continues to be the same.

If you change jdkElementsMap into a var, then you could do jdkEleemntsMap = jdkElementsMap + (name -> hrefValue), or just jdkElementsMap += (name -> hrefValue). The latter will also work for mutable maps.

Is that the only way? No, but you have to let go of while loops to achieve the same thing. You could replace these lines:

val href: java.util.Iterator[Element] = e.iterator();
while (href.hasNext()) {
  var objectName = href.next();
  var hrefValue = objectName.attr("href");
  var name = objectName.text();

  jdkElementsMap + name -> hrefValue
        println("size is "+jdkElementsMap.size)
}

jdkElementsMap

With a fold, such as in:

import scala.collection.JavaConverters.asScalaIteratorConverter

e.iterator().asScala.foldLeft(jdkElementsMap) {
  case (accumulator, href) =>  // href here is not an iterator
    val objectName = href
    val hrefValue = objectName.attr("href")
    val name = objectName.text()

    val newAccumulator = accumulator + (name -> hrefValue)

    println("size is "+newAccumulator.size)

    newAccumulator
}

Or with recursion:

def createMap(hrefIterator: java.util.Iterator[Element],
              jdkElementsMap: Map[String, String]): Map[String, String] = {
  if (hrefIterator.hasNext()) {
    val objectName = hrefIterator.next()
    val hrefValue = objectName.attr("href")
    val name = objectName.text()

    val newMap = jdkElementsMap + name -> hrefValue

    println("size is "+newMap.size)

    createMap(hrefIterator, newMap)
  } else {
     jdkElementsMap
  }
}

createMap(e.iterator(), new TreeMap[String, String])

Performance-wise, the fold will be rather slower, and the recursion should be very slightly faster.

Mind you, Scala does provide mutable maps, and not just to be able to say it has them: if they fit better you problem, then go ahead and use them! If you want to learn how to use the immutable ones, then the two approaches above are the ones you should learn.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
3

The map is immutable, so any modifications will return the modified map. jdkElementsMap + (name -> hrefValue) returns a new map containing the new pair, but you are discarding the modified map after it is created.

EDIT: It looks like you can convert Java iterables to Scala iterables, so you can then fold over the resulting sequence and accumulate a map:

import scala.collection.JavaConverters._
val e: Elements = doc.getElementsByAttribute("href");
val jdkElementsMap = e.asScala
    .foldLeft(new TreeMap[String , String])((map, href) => map + (href.text() -> href.attr("href"))

if you don't care what kind of map you create you can use toMap:

val jdkElementsMap = e.asScala
    .map(href => (href.text(), href.attr("href")))
    .toMap
Lee
  • 142,018
  • 20
  • 234
  • 287