Questions tagged [sequencefile]

A SequenceFile is a Hadoop binary file containing key/value pairs.

A SequenceFile is a file format used by Hadoop for the efficient storage and retrieval of key/value pairs. It is also possible to use compression techniques for more efficient storage.

For more information view the API documentation or the Wiki page.

157 questions
21
votes
3 answers

Advantages of Sequence file over hdfs textfile

What is the advantage of Hadoop Sequence File over HDFS flat file(Text)? In what way Sequence file is efficient? Small files can be combined and written into a sequence file, but the same can be done for a HDFS text file also. Need to know the…
hrkrshn
  • 213
  • 1
  • 2
  • 7
17
votes
1 answer

How to read/write protocol buffer messages with Apache Spark?

I want to Read/write protocol buffer messages from/to HDFS with Apache Spark. I found these suggested ways: 1) Convert protobuf messsages to Json with Google's Gson Library and then read/write them by SparkSql. This solution is explained in this…
DAVID_ROA
  • 309
  • 1
  • 3
  • 18
10
votes
2 answers

Write and read raw byte arrays in Spark - using Sequence File SequenceFile

How do you write RDD[Array[Byte]] to a file using Apache Spark and read it back again?
samthebest
  • 30,803
  • 25
  • 102
  • 142
7
votes
1 answer

using pyspark, read/write 2D images on hadoop file system

I want to be able to read / write images on an hdfs file system and take advantage of the hdfs locality. I have a collection of images where each image is composed of 2D arrays of uint16 basic additional information stored as an xml file. I…
MathiasOrtner
  • 583
  • 1
  • 4
  • 17
7
votes
6 answers

hadoop mapreduce: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

I am trying to write a snappy block compressed sequence file from a map-reduce job. I am using hadoop 2.0.0-cdh4.5.0, and snappy-java 1.0.4.1 Here is my code: package jinvestor.jhouse.mr; import java.io.ByteArrayOutputStream; import…
msknapp
  • 1,595
  • 7
  • 22
  • 39
7
votes
1 answer

Extend SequenceFileInputFormat to include file name+offset

I would like to be able to create a custom InputFormat that reads sequence files, but additionally exposes the file path and offset within that file where the record is located. To take a step back, here's the use case: I have a sequence file…
Joe K
  • 18,204
  • 2
  • 36
  • 58
6
votes
2 answers

Handling Writables fully qualified name changes in Hadoop SequenceFile

I have a bunch of Hadoop SequenceFiles that have been written with some Writable subclass I wrote. Let's call it FishWritable. This Writable worked out well for a while, until I decided there was need for a package renaming for clarity. So now the…
Alex A.
  • 2,646
  • 22
  • 36
6
votes
1 answer

Reading Hadoop SequenceFiles with Hive

I have some mapred data from the Common Crawl that I have stored in a SequenceFile format. I have tried repeatedly to use this data "as is" with Hive so I can query and sample it at various stages. But I always get the following error in my job…
codingmonk
  • 63
  • 1
  • 5
5
votes
1 answer

Converting CSV to SequenceFile

I have a CSV file which I would like to convert to a SequenceFile, which I would ultimately use to create NamedVectors to use in a clustering job. I've been using the seqdirectory command to try to make a SequenceFile, and then fed that output into…
Alison
  • 99
  • 2
  • 7
4
votes
0 answers

Best practice for storing protobuf serialized data in HDFS

what is the preferred way of storing protobuf encoded data in HDFS. Currently I see two possible solutions: a) sequence files: storing the serialized/encoded binary data, i.e., the "byte[]" in the corresponding value of a sequence file. b)…
3
votes
1 answer

How to create hadoop sequence file in local file system without hadoop installation?

Is it possible to create hadoop sequence file from java only without installing hadoop? I need a standalone java program that create sequence file locally. My java program will run in env that does not have hadoop install.
Sean Nguyen
  • 12,528
  • 22
  • 74
  • 113
3
votes
2 answers

NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size

I have tried different ways to create a large Hadoop SequenceFile with simply one short(<100bytes) key but one large (>1GB) value (BytesWriteable). The following sample works for…
3
votes
1 answer

How does Mapper class identify the SequenceFile as inputfile in hadoop?

In my one MapReduce task, I override the BytesWritable as KeyBytesWritable, and override the ByteWritable as ValueBytesWritable. Then I output the result using SequenceFileOutputFormat. My question is when I start the next MapReduce task, I want to…
JoJo
  • 1,377
  • 3
  • 14
  • 28
3
votes
1 answer

What does the sync and syncFs of SequenceFile.Writer means?

Environment: Hadoop 0.20.2-cdh3u5 I am trying to upload log data (10G) to HDFS with a customized tool which using SequenceFile.Writer. SequenceFile.Writer w = SequenceFile.createWriter( hdfs, conf, p, …
Evans Y.
  • 4,209
  • 6
  • 35
  • 43
3
votes
1 answer

How can I use Mahout's sequencefile API code?

There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i -o . I want use this command as code API.
1
2 3
10 11