Currently we have an implementation in pig to generate sequence files from records where some of the attributes of a record are treated as key of sequence file and all the records corresponding to that key are stored in one sequence file. As we are moving to spark , i want to know how can this be done in spark ?
Asked
Active
Viewed 5,286 times
1 Answers
1
saveAsSequnceFile save the data as a sequence file.
val a=sc.parallelize(List(1,2,3,4,5)).map(x=>(x,x*10)).saveAsSequenceFile("/saw1")
$ hadoop fs -cat /sqes/part-00000
SEQ org.apache.hadoop.io.IntWritable org.apache.hadoop.io.IntWritableZ tTrh7��g�,��
2[cloudera@quickstart ~]$
to read the sequencefile use sc.sequenceFile
val sw=sc.sequenceFile("/saw1/part-00000", classOf[IntWritable],classOf[IntWritable]).collect

Ishan Kumar
- 1,941
- 3
- 20
- 29