1

I'm selecting columns from a json file, transform some of them and would like to store the result as a parquet file but I'm failing.

This is what I'm doing:

val jsonFiles=sqlContext.jsonFile("/requests.loading")
jsonFiles.registerTempTable("jRequests")

val clean_jRequests=sqlContext.sql("select c1, c2, c3 ... c55 from jRequests")

and then I run a map:

val jRequests_flat=clean_jRequests.map(line=>{((line(1),line(2),line(3),line(4),line(5),line(6),line(7),line(8).asInstanceOf[Iterable[String]].mkString(","),line(9) ,line(10) ,line(11) ,line(12) ,line(13) ,line(14) ,line(15) ,line(16) ,line(17) ,line(18) ,line(19) ,line(20) ,line(21) ,line(22) ,line(23) ,line(24) ,line(25) ,line(26) ,line(27) ,line(28) ,line(29) ,line(30) ,line(31) ,line(32) ,line(33) ,line(34) ,line(35) ,line(36) ,line(37) ,line(38) ,line(39) ,line(40) ,line(41) ,line(42) ,line(43) ,line(44) ,line(45) ,line(46) ,line(47) ,line(48) ,line(49) ,line(50)))})
  1. Is there a smarter way to achieve that (only modify a certain column without relating to the others, but keeping all of them)?
  2. The last statement fails because the tuple has too much members: :19: error: object Tuple50 is not a member of package scala


Thanks,
Daniel

Daniel Haviv
  • 1,036
  • 8
  • 16

1 Answers1

0

Do you need all 55 columns ?

You can create a case class that just holds that columns you need and save this subset.

Of course you keep the original file that has all the data if you need that in the future.

You are getting the Tuple50 error because you are hitting the 22 Tuple limit in Scala - see Why does the Scala library only defines tuples up to Tuple22?

Community
  • 1
  • 1
Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161