Remapping columns from a schemaRDD

Question

I'm selecting columns from a json file, transform some of them and would like to store the result as a parquet file but I'm failing.

This is what I'm doing:

val jsonFiles=sqlContext.jsonFile("/requests.loading")
jsonFiles.registerTempTable("jRequests")

val clean_jRequests=sqlContext.sql("select c1, c2, c3 ... c55 from jRequests")

and then I run a map:

val jRequests_flat=clean_jRequests.map(line=>{((line(1),line(2),line(3),line(4),line(5),line(6),line(7),line(8).asInstanceOf[Iterable[String]].mkString(","),line(9) ,line(10) ,line(11) ,line(12) ,line(13) ,line(14) ,line(15) ,line(16) ,line(17) ,line(18) ,line(19) ,line(20) ,line(21) ,line(22) ,line(23) ,line(24) ,line(25) ,line(26) ,line(27) ,line(28) ,line(29) ,line(30) ,line(31) ,line(32) ,line(33) ,line(34) ,line(35) ,line(36) ,line(37) ,line(38) ,line(39) ,line(40) ,line(41) ,line(42) ,line(43) ,line(44) ,line(45) ,line(46) ,line(47) ,line(48) ,line(49) ,line(50)))})

Is there a smarter way to achieve that (only modify a certain column without relating to the others, but keeping all of them)?
The last statement fails because the tuple has too much members: :19: error: object Tuple50 is not a member of package scala

Thanks,
Daniel

score 0 · Answer 1 · edited May 23 '17 at 12:06

0

Do you need all 55 columns ?

You can create a case class that just holds that columns you need and save this subset.

Of course you keep the original file that has all the data if you need that in the future.

You are getting the Tuple50 error because you are hitting the 22 Tuple limit in Scala - see Why does the Scala library only defines tuples up to Tuple22?

edited May 23 '17 at 12:06

Community

1
1

answered Nov 25 '14 at 18:01

Soumya Simanta

11,523
24
106
161

Thanks, I do need all 55.. there is probably a better way to do it then what I'm trying to do. – Daniel Haviv Nov 26 '14 at 09:07

Remapping columns from a schemaRDD

1 Answers1