2

I want to use a Java library in my Scala program. The library contains a generic class which is part of other classes:

package java.items;

public class Item<T extends Comparable> implements Comparable<Item> {  
  private T id;
 ...
}

public final class Itemset{
  private List<Item> items = new ArrayList<Item>();
  public List<Item> getItems() { return items; }
 ...
}

public class Sequence {
  private final List<Itemset> itemsets = new ArrayList<Itemset>();
  public List<Itemset> getItemsets() { return itemsets; }
 ...
}

In my Scala code, I loop over the different objects and need to instantiate a hashmap of type [T, Int] to store the Ids with a counter:

import java.items._

object ConvertSequence {

  def ConvertSequence (dataset: RDD[(Sequence)], sc: SparkContext) {

    sc.broadcast(dataset.flatMap(r => {
      val itemCounts = new HashMap[AnyRef, Int]

      for (itemset <- r.getItemsets) {
        for (item <- itemset.getItems) {
          val i = itemCounts.getOrElse(item.getId, 0)
          itemCounts.update(item.getId, i + 1)
        }
      }
      itemCounts
    }).
    map(r => (r._1, (r._2, 1))).
    reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2)).
    sortBy(r => (r._2._1, r._1))
    zipWithIndex().
    collect({ case (k, v) => (k._1, v)})
  )
}

I don't know which type to pass to the hashmap constructor (T is not available from my Scala object as only Item is typed). I've tried AnyRef but I get a error at compilation:

[ERROR]  error: type mismatch;
[INFO]  found   : ?0
[INFO]  required: AnyRef
[INFO] Note that ?0 is unbounded, which means AnyRef is not a known parent.
[INFO] Such types can participate in value classes, but instances
[INFO] cannot appear in singleton types or in reference comparisons.
[INFO]           val i = itemCounts.getOrElse(item.getId, 0)
[INFO]                                             ^
[ERROR] one error found

How can I manage polymorphism between my Java and Scala code?

Alex
  • 351
  • 1
  • 12
  • 1
    What are the types of `r.getItemsets` and `itemset.getItems`? What are the parameters to the method in which your Scala code resides? – Dan Getz Mar 02 '15 at 18:28
  • 1
    "T is not available from my scala object" - sounds like a problem with generics, not a problem specifically in Scala. Would you be able to write the working method in Java? – user253751 Mar 02 '15 at 18:34
  • r.getItemsets is alist of itemsets and itemset.getItems is a list of items. – Alex Mar 02 '15 at 19:24
  • @Alex we're talking about polymorphism and generics, so what would be important would be the actual return type (including generics) of those methods. In any case, I can see that you're using raw types in your Java code. You should never do that in new code. See http://stackoverflow.com/q/2770321/3004881 – Dan Getz Mar 02 '15 at 21:05
  • @Dan, I've added the methods code as you can see there is no type parameter for the return type of these methods (class sequence, itemset). Should I had a type parameter to the sequence and itemset class ? BTW, the java code is not mine, it comes from an external library so I would have prefered not to change it. – Alex Mar 03 '15 at 09:50
  • @Alex the question I linked explains why you shouldn't use raw types and what the alternatives are. By the way, the error you posted and the code you posted don't match: in the code, you call `itemCounts.getOrElse(item.getId.toString, 0)`, but in your error, it's `itemCounts.getOrElse(item.getId, 0)`. – Dan Getz Mar 03 '15 at 13:46
  • What is the type that `sortBy` is called on, and/or what do you intend for the return type of `sortBy` to be? – Dan Getz Mar 03 '15 at 16:56
  • @Dan : My bad, I mixed the c/p between 2 of my tries, I've edited my code and added some stuff. The output of my function is a collection of tuple that I broadcast over my cluster. It's basically the list of my item indexed by the rank of their frequency in each sequences (ItemId, Rank). The .sortBy is done on a lazy distributed collection of tuple (RDD) and output the same type. – Alex Mar 03 '15 at 17:57

2 Answers2

0

I coded up a basic scenario involving your problem, had no such problem. Without more information on your end it's hard to say what exactly is going wrong - specifically, the whole object the scala code occurs in. At minimum the method header that your posted code occurs in, so we can check all of the types. But here's what I wrote that seems to work, maybe something in here will fix your issue:

Java class with generics:

package javaCompat;

public class Item<T> {

    public final T id;

    public Item(T id){
        this.id = id;
    }
}

Scala code that uses the generic Java class:

import javaCompat.Item
import scala.collection.mutable.HashMap

object Compat {
  def main(args : Array[String]){
    val items = 
          List("A","B","C","D","E","A","B","A","C","E","F","D").map {x => new Item(x)}
    print(labelCount(items))
  }

  def labelCount[T](items : List[Item[T]]) : HashMap[T, Int] = {
    val itemCounts = new HashMap[T, Int]()
    for (item <- items) {
      val i = itemCounts.getOrElse(item.id, 0)
      itemCounts.update(item.id, i + 1)
    }
    itemCounts
  }
}
Mshnik
  • 7,032
  • 1
  • 25
  • 38
0

Partial solution (without being able to sort the ids)

If you have any control over the Java code, you should never use raw types, like the Item in List<Item>, if they're avoidable. See the answer to this question for more information.

If you can't fix the Java code, then item.getId is going to return an object of unknown type, which leads to the error you saw. You almost found a solution to this, when you tried treating it as an AnyRef. The thing is, AnyRef is not the base type of all types in Scala. Any is. AnyRef is the base type of all types that can be null, but there are types that can't be null, such as Int, for example. So part of your code should work if you define itemCounts as follows:

val itemCounts = new HashMap[Any, Int]

If you want the key type of itemCounts to be something specific that you know is a supertype of all the items' ids, you will have to cast, with asInstanceOf, either the items:

val castedItem = item.asInstanceOf[Item[String]]
val castedItem = item.asInstanceOf[Item[AnyRef]]

or the ids:

val castedId = item.getId.asInstanceOf[Integer]
val castedId = item.getId.asInstanceOf[AnyRef]
Community
  • 1
  • 1
Dan Getz
  • 8,774
  • 6
  • 30
  • 64
  • Oh right. Well, the Java class extended Comparable in the wrong way, to begin with. Are you sure the Java library isn't open source or from people you know? Are you sure the type of the ids can't be known when the function is called? – Dan Getz Mar 03 '15 at 14:54
  • the library is gpl so I can eventually modify it (if that's the matter), so if you have some guidance for improving the code...The id should be string or Int but my point was to not have to worry about the type. I don't understand what you men by "Java class extended Comparable in the wrong way" ? – Alex Mar 03 '15 at 15:21
  • I meant by using generics wrong: `Item implements Comparable` should have been `Item> implements Comparable>` – Dan Getz Mar 03 '15 at 15:29