1

I have created a case class like this:

def case_class(): Unit = {
   case class StockPrice(quarter : Byte,
                      stock : String,
                      date : String,
                      open : Double,
                      high : Double,
                      low : Double,
                      close : Double,
                      volume : Double,
                      percent_change_price : Double,
                      percent_change_volume_over_last_wk : Double,
                      previous_weeks_volume : Double,
                      next_weeks_open : Double,
                      next_weeks_close : Double,
                      percent_change_next_weeks_price : Double,
                      days_to_next_dividend : Double,
                      percent_return_next_dividend : Double
                     )

And I have thousands of line as Array of String like this:

1,AA,1/7/2011,$15.82,$16.72,$15.78,$16.42,239655616,3.79267,,,$16.71,$15.97,-4.42849,26,0.182704

1,AA,1/14/2011,$16.71,$16.71,$15.64,$15.97,242963398,-4.42849,1.380223028,239655616,$16.19,$15.79,-2.47066,19,0.187852

1,AA,1/21/2011,$16.19,$16.38,$15.60,$15.79,138428495,-2.47066,-43.02495926,242963398,$15.87,$16.13,1.63831,12,0.189994

1,AA,1/28/2011,$15.87,$16.63,$15.82,$16.13,151379173,1.63831,9.355500109,138428495,$16.18,$17.14,5.93325,5,0.185989

How Can I parse data from Array into that case class? Thank you for your help!

talex
  • 17,973
  • 3
  • 29
  • 66
Key Jun
  • 440
  • 3
  • 8
  • 18
  • 2
    https://stackoverflow.com/questions/22244643/scala-case-class-arguments-instantiation-from-array – gaston Nov 28 '18 at 10:13
  • 1
    It must be stressed that any function to do this will likely blow up if the Array doesn't contain the correct number of fields or doesn't contain the right data types. You'll need to do that validation for yourself and handle any error cases. – James Whiteley Nov 28 '18 at 11:13

2 Answers2

2

You can proceed as below (I've taken simplified example)

Given your case class and data (lines)

// Your case-class
case class MyCaseClass(
  fieldByte: Byte,
  fieldString: String,
  fieldDouble: Double
)

// input data
val lines: List[String] = List(
  "1,AA,$1.1",
  "2,BB,$2.2",
  "3,CC,$3.3"
)

Note: you can read lines from a text file as

val lines = Source.fromFile("my_file.txt").getLines.toList

You can have some utility methods for mapping (cleaning & parsing)

// remove '$' symbols from string
def removeDollars(line: String): String = line.replaceAll("\\$", "")

// split string into tokens and
// convert into MyCaseClass object
def parseLine(line: String): MyCaseClass = {
  val tokens: Seq[String] = line.split(",")
  MyCaseClass(
    fieldByte = tokens(0).toByte,
    fieldString = tokens(1),
    fieldDouble = tokens(2).toDouble
  )
}

And then use them to convert strings into case-class objects

// conversion
val myCaseClassObjects: Seq[MyCaseClass] = lines.map(removeDollars).map(parseLine)

As a more advanced (and generalized) approach, you can generate the mapping (parsing) function for converting tokens into fields of your case-class using something like reflection, as told here

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
  • I dont know why I have to delete `toDouble`and change all data types in `StockPrice` into `String` then it works – Key Jun Nov 28 '18 at 13:44
  • @KeyJun my guess is that it's due to the empty fields in some of your lines. This is why I've added the `.map { case x if .....` section to my answer. If you do something like that, this answer will work. – James Whiteley Nov 29 '18 at 15:52
0

Here's one way of doing it. I'd recommend splitting everything you do up into lots of small, easy-to-manage functions, otherwise you will get lost trying to figure out where something is going wrong if it all starts throwing exceptions. Data setup:

val array = Array("1,AA,1/7/2011,$15.82,$16.72,$15.78,$16.42,239655616,3.79267,,,$16.71,$15.97,-4.42849,26,0.182704",
  "1,AA,1/14/2011,$16.71,$16.71,$15.64,$15.97,242963398,-4.42849,1.380223028,239655616,$16.19,$15.79,-2.47066,19,0.187852",
  "1,AA,1/21/2011,$16.19,$16.38,$15.60,$15.79,138428495,-2.47066,-43.02495926,242963398,$15.87,$16.13,1.63831,12,0.189994",
  "1,AA,1/28/2011,$15.87,$16.63,$15.82,$16.13,151379173,1.63831,9.355500109,138428495,$16.18,$17.14,5.93325,5,0.185989")

case class StockPrice(quarter: Byte, stock: String, date: String, open: Double,
  high: Double, low: Double, close: Double, volume: Double, percent_change_price: Double,
  percent_change_volume_over_last_wk: Double, previous_weeks_volume: Double,
  next_weeks_open: Double, next_weeks_close: Double, percent_change_next_weeks_price: Double,
  days_to_next_dividend: Double, percent_return_next_dividend: Double
)

Function to turn Array[String] into Array[List[String]] and handle any empty fields (I've made an assumption here that you want empty fields to be 0. Change this as necessary):

def splitArray(arr: Array[String]): Array[List[String]] = {
  arr.map(
    _.replaceAll("\\$", "")         // Remove $
      .split(",")                   // Split by ,
      .map {
        case x if x.isEmpty => "0"  // If empty
        case y => y                 // If not empty
      }
      .toList
  )
}

Function to turn a List[String] into a StockPrice. Note that this will fall over if the List isn't exactly 16 items long. I'll leave you to handle any of that. Also, the names are pretty non-descriptive so you can change that too. It will also fall over if your data doesn't map to the relevant .toDouble or toByte or whatever - you can handle this yourself too:

def toStockPrice: List[String] => StockPrice = {
  case a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil =>
    StockPrice(a.toByte, b, c, d.toDouble, e.toDouble, f.toDouble, g.toDouble, h.toDouble, i.toDouble, j.toDouble,
      k.toDouble, l.toDouble, m.toDouble, n.toDouble, o.toDouble, p.toDouble)
}

A nice function to bring this all together:

def makeCaseClass(arr: Array[String]): Seq[StockPrice] = {
  val splitArr: Array[List[String]] = splitArray(arr)
  splitArr.map(toStockPrice)
}

Output:

println(makeCaseClass(array))

//ArraySeq(
// StockPrice(1,AA,1/7/2011,15.82,16.72,15.78,16.42,2.39655616E8,3.79267,0.0,0.0,16.71,15.97,-4.42849,26.0,0.182704), 
// StockPrice(1,AA,1/14/2011,16.71,16.71,15.64,15.97,2.42963398E8,-4.42849,1.380223028,2.39655616E8,16.19,15.79,-2.47066,19.0,0.187852), 
// StockPrice(1,AA,1/21/2011,16.19,16.38,15.6,15.79,1.38428495E8,-2.47066,-43.02495926,2.42963398E8,15.87,16.13,1.63831,12.0,0.189994), 
// StockPrice(1,AA,1/28/2011,15.87,16.63,15.82,16.13,1.51379173E8,1.63831,9.355500109,1.38428495E8,16.18,17.14,5.93325,5.0,0.185989)
//)

Edit:

To explain the a :: b :: c ..... bit - this is a way of assigning names to items in a List or Seq, given you know the List's size.

val ls = List(1, 2, 3)
val a :: b :: c :: Nil = List(1, 2, 3)
println(a == ls.head) // true
println(b == ls(1)) // true
println(c == ls(2)) // true

Note that the Nil is important because it signifies the last element of the List being Nil. Without it, c would be equal to List(3) as the rest of any List is assigned to the last value in your definition.

You can use this in pattern matching as I have in order to do something with the results:

val ls = List(1, "b", true)
ls match {
  case a :: b :: c if c == true => println("this will not be printed")
  case a :: b :: c :: Nil if c == true => println(s"this will get printed because c == $c")
} // not exhaustive but you get the point

You can also use it if you know what you want each element in the List to be, like this:

val personCharacteristics = List("James", 26, "blue", 6, 85.4, "brown")
val name :: age :: eyeColour :: otherCharacteristics = personCharacteristics
println(s"Name: $name; Age: $age; Eye colour: $eyeColour")
// Name: James; Age: 26; Eye colour: blue

Obviously these examples are pretty trivial and not exactly what you'd see as a professional Scala developer (at least I don't), but it's a handy thing to be aware of as I do still use this :: syntax at work sometimes.

James Whiteley
  • 3,363
  • 1
  • 19
  • 46
  • This is also a good way. Thank you. But I only can accept one answer. – Key Jun Nov 28 '18 at 14:24
  • Not to worry. This was an interesting challenge to try to solve. Glad I could help. – James Whiteley Nov 28 '18 at 14:26
  • Could you please tell me what exactly this line `a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil` do? – Key Jun Nov 28 '18 at 14:26
  • @KeyJun I've edited my post to explain it a little more. It's essentially giving a name to the elements in the List, assuming the List is exactly 16 items long. – James Whiteley Nov 28 '18 at 14:51
  • Thank you very much for your coherent explanations. I can learn pretty much from your codes. Thanks again and have a nice day :) – Key Jun Nov 28 '18 at 15:07
  • I have one more question When I print my ArraySeq in IntelliJ It shows all things in just one long line. How can I show it like a dataframe or something similar – Key Jun Nov 29 '18 at 16:06
  • Maybe something like `makeCaseClass(array).foreach(println)`? I'm not very familiar with dataframes I'm afraid. You could define your own `toString` for the StockPrice case class and print that I guess – James Whiteley Nov 29 '18 at 16:09