0

I want to create an array buffer in scala without instantiating it with a datatype in the start. I want to check a condition and then pass the type to it dynamically. Look the the given code.

def rowGen(startNumber:Int,tableIdentifier:String,NumRows:Int)={
var tmpArrayBuffer:collection.mutable.ArrayBuffer[_]=null  // I tried [T] here. That didn't work either.
tableIdentifier match {
case value if value==baseTable => tmpArrayBuffer= new collection.mutable.ArrayBuffer[(String,String,String,String)]()
case value if value==batchTable => tmpArrayBuffer= new collection.mutable.ArrayBuffer[(String,String)]()
}
for (currentNum <- startNumber to startNumber+NumRows)
tableIdentifier match {
case value if value==baseTable => tmpArrayBuffer+=(s"col1-${currentNum}",s"col2-${currentNum}",s"col3-${currentNum}",s"col4-${currentNum}")
case value if value==batchTable => tmpArrayBuffer+=(s"col1-${currentNum}",s"col2-${currentNum}")
}
tableIdentifier match {
case value if value==baseTable => tmpArrayBuffer.toSeq.toDF("col1","col2","col3","col4")
case value if value==batchTable => tmpArrayBuffer.toSeq.toDF("col1","col2")
}
}

Kindly help me with this. Based on a condition I want to instantiate ArrayBuffer[(String,String)] or ArrayBuffer[(String,String,String,String)].

Raptor0009
  • 258
  • 4
  • 14

1 Answers1

2

I would just define the array buffer inside the match:

import org.apache.spark.sql.DataFrame

val baseTable = "baseTable"
val batchTable = "batchTable"

def rowGen(startNumber:Int, tableIdentifier:String, NumRows:Int) : DataFrame = {
    tableIdentifier match {
        case `baseTable` => {
            var tmpArrayBuffer = new collection.mutable.ArrayBuffer[(String,String,String,String)]
            for (currentNum <- startNumber to startNumber+NumRows){
                tmpArrayBuffer += ((s"col1-${currentNum}",s"col2-${currentNum}",s"col3-${currentNum}",s"col4-${currentNum}"))
            }
            tmpArrayBuffer.toSeq.toDF("col1","col2","col3","col4")
        }
        case `batchTable` => {
            var tmpArrayBuffer = new collection.mutable.ArrayBuffer[(String,String)]
            for (currentNum <- startNumber to startNumber+NumRows) {
                tmpArrayBuffer += ((s"col1-${currentNum}",s"col2-${currentNum}"))
            }
            tmpArrayBuffer.toSeq.toDF("col1","col2")
        }
    }
}

scala> rowGen(1, "batchTable", 5).show
+------+------+
|  col1|  col2|
+------+------+
|col1-1|col2-1|
|col1-2|col2-2|
|col1-3|col2-3|
|col1-4|col2-4|
|col1-5|col2-5|
|col1-6|col2-6|
+------+------+

scala> rowGen(1, "baseTable", 5).show
+------+------+------+------+
|  col1|  col2|  col3|  col4|
+------+------+------+------+
|col1-1|col2-1|col3-1|col4-1|
|col1-2|col2-2|col3-2|col4-2|
|col1-3|col2-3|col3-3|col4-3|
|col1-4|col2-4|col3-4|col4-4|
|col1-5|col2-5|col3-5|col4-5|
|col1-6|col2-6|col3-6|col4-6|
+------+------+------+------+

Or, as the comment suggested, using Seq.newBuilder is better:

import org.apache.spark.sql.DataFrame

val baseTable = "baseTable"
val batchTable = "batchTable"

def rowGen(startNumber:Int, tableIdentifier:String, NumRows:Int) : DataFrame = {
    tableIdentifier match {
        case `baseTable` => {
            var tmpArrayBuffer = Seq.newBuilder[(String,String,String,String)]
            for (currentNum <- startNumber to startNumber+NumRows){
                tmpArrayBuffer += ((s"col1-${currentNum}",s"col2-${currentNum}",s"col3-${currentNum}",s"col4-${currentNum}"))
            }
            tmpArrayBuffer.result.toDF("col1","col2","col3","col4")
        }
        case `batchTable` => {
            var tmpArrayBuffer = Seq.newBuilder[(String,String)]
            for (currentNum <- startNumber to startNumber+NumRows) {
                tmpArrayBuffer += ((s"col1-${currentNum}",s"col2-${currentNum}"))
            }
            tmpArrayBuffer.result.toDF("col1","col2")
        }
    }
}
mck
  • 40,932
  • 13
  • 35
  • 50
  • 3
    Directly use a `Seq.newBuilder`, do not see any benefit from `ArrayBuffer` there – cchantep Dec 25 '20 at 08:36
  • Thanks @cchantep and mck for the help. I thought we can use the same variable in different places, but Seq.newBuilder works fine too. – Raptor0009 Dec 25 '20 at 09:40
  • @Raptor0009 you can, but I don't see a point in dynamically instantiating it beforehand. I'd just refactor the code to avoid the need for that, just as what I did in my answers. – mck Dec 25 '20 at 09:44
  • @mck If It can be done,Can you help me with the sample code of dynamically passing the dType to ArrayBuffer. I'm just curious. – Raptor0009 Dec 25 '20 at 14:53
  • @Raptor0009 It's not obvious to me how that can be done. . . you may need some fancy polymorphism to achieve that – mck Dec 25 '20 at 14:55
  • @Raptor0009 you can look at this question if you are interested - it's very complicated. https://stackoverflow.com/questions/35682984/generic-collection-generation-with-a-generic-type – mck Dec 25 '20 at 14:57