1

Currently we have an implementation in pig to generate sequence files from records where some of the attributes of a record are treated as key of sequence file and all the records corresponding to that key are stored in one sequence file. As we are moving to spark , i want to know how can this be done in spark ?

rk.the1
  • 89
  • 1
  • 10

1 Answers1

1

saveAsSequnceFile save the data as a sequence file.

val a=sc.parallelize(List(1,2,3,4,5)).map(x=>(x,x*10)).saveAsSequenceFile("/saw1")

$ hadoop fs -cat /sqes/part-00000
SEQ org.apache.hadoop.io.IntWritable org.apache.hadoop.io.IntWritableZ      tTrh7��g�,��
2[cloudera@quickstart ~]$ 

to read the sequencefile use sc.sequenceFile

 val sw=sc.sequenceFile("/saw1/part-00000", classOf[IntWritable],classOf[IntWritable]).collect
Ishan Kumar
  • 1,941
  • 3
  • 20
  • 29