Highest Voted 'sequencefile' Questions

21

votes

3 answers

Advantages of Sequence file over hdfs textfile

What is the advantage of Hadoop Sequence File over HDFS flat file(Text)? In what way Sequence file is efficient? Small files can be combined and written into a sequence file, but the same can be done for a HDFS text file also. Need to know the…

hadoop hdfs sequencefile

asked Aug 02 '12 at 13:40

hrkrshn

213
1
2
7

17

votes

1 answer

How to read/write protocol buffer messages with Apache Spark?

I want to Read/write protocol buffer messages from/to HDFS with Apache Spark. I found these suggested ways: 1) Convert protobuf messsages to Json with Google's Gson Library and then read/write them by SparkSql. This solution is explained in this…

apache-spark hdfs protocol-buffers sequencefile

asked Aug 30 '18 at 11:59

DAVID_ROA

309
1
3
18

10

votes

2 answers

Write and read raw byte arrays in Spark - using Sequence File SequenceFile

How do you write RDD[Array[Byte]] to a file using Apache Spark and read it back again?

scala hadoop hdfs apache-spark sequencefile

asked Jun 06 '14 at 13:42

samthebest

30,803
25
102
142

7

votes

1 answer

using pyspark, read/write 2D images on hadoop file system

I want to be able to read / write images on an hdfs file system and take advantage of the hdfs locality. I have a collection of images where each image is composed of 2D arrays of uint16 basic additional information stored as an xml file. I…

hadoop apache-spark sequencefile pyspark

asked Feb 25 '15 at 22:46

MathiasOrtner

583
1
4
17

7

votes

6 answers

hadoop mapreduce: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

I am trying to write a snappy block compressed sequence file from a map-reduce job. I am using hadoop 2.0.0-cdh4.5.0, and snappy-java 1.0.4.1 Here is my code: package jinvestor.jhouse.mr; import java.io.ByteArrayOutputStream; import…

java hadoop mapreduce sequencefile snappy

asked Mar 03 '14 at 15:16

msknapp

1,595
7
22
39

7

votes

1 answer

Extend SequenceFileInputFormat to include file name+offset

I would like to be able to create a custom InputFormat that reads sequence files, but additionally exposes the file path and offset within that file where the record is located. To take a step back, here's the use case: I have a sequence file…

java hadoop mapreduce sequencefile

asked Sep 05 '13 at 17:52

Joe K

18,204
2
36
58

6

votes

2 answers

Handling Writables fully qualified name changes in Hadoop SequenceFile

I have a bunch of Hadoop SequenceFiles that have been written with some Writable subclass I wrote. Let's call it FishWritable. This Writable worked out well for a while, until I decided there was need for a package renaming for clarity. So now the…

serialization hadoop sequencefile

asked Sep 19 '13 at 00:55

Alex A.

2,646
22
36

6

votes

1 answer

Reading Hadoop SequenceFiles with Hive

I have some mapred data from the Common Crawl that I have stored in a SequenceFile format. I have tried repeatedly to use this data "as is" with Hive so I can query and sample it at various stages. But I always get the following error in my job…

hive sequencefile

asked Nov 02 '12 at 22:16

codingmonk

63
1
5

5

votes

1 answer

Converting CSV to SequenceFile

I have a CSV file which I would like to convert to a SequenceFile, which I would ultimately use to create NamedVectors to use in a clustering job. I've been using the seqdirectory command to try to make a SequenceFile, and then fed that output into…

hadoop mahout sequencefile

asked Aug 16 '12 at 20:25

Alison

99
2
7

4

votes

0 answers

Best practice for storing protobuf serialized data in HDFS

what is the preferred way of storing protobuf encoded data in HDFS. Currently I see two possible solutions: a) sequence files: storing the serialized/encoded binary data, i.e., the "byte[]" in the corresponding value of a sequence file. b)…

protocol-buffers apache-kafka parquet sequencefile

asked Aug 26 '15 at 15:27

Thomas Beer

230
3
9

3

votes

1 answer

How to create hadoop sequence file in local file system without hadoop installation?

Is it possible to create hadoop sequence file from java only without installing hadoop? I need a standalone java program that create sequence file locally. My java program will run in env that does not have hadoop install.

hadoop sequencefile

asked May 15 '15 at 09:39

Sean Nguyen

12,528
22
74
113

3

votes

2 answers

NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size

I have tried different ways to create a large Hadoop SequenceFile with simply one short(<100bytes) key but one large (>1GB) value (BytesWriteable). The following sample works for…

hadoop out-of-memory heap-memory large-files sequencefile

asked Jun 09 '14 at 19:14

user815613

95
3
8

3

votes

1 answer

How does Mapper class identify the SequenceFile as inputfile in hadoop?

In my one MapReduce task, I override the BytesWritable as KeyBytesWritable, and override the ByteWritable as ValueBytesWritable. Then I output the result using SequenceFileOutputFormat. My question is when I start the next MapReduce task, I want to…

hadoop mapper sequencefile

asked Mar 02 '13 at 21:08

JoJo

1,377
3
14
28

3

votes

1 answer

What does the sync and syncFs of SequenceFile.Writer means?

Environment: Hadoop 0.20.2-cdh3u5 I am trying to upload log data (10G) to HDFS with a customized tool which using SequenceFile.Writer. SequenceFile.Writer w = SequenceFile.createWriter( hdfs, conf, p, …

hadoop hdfs sequencefile

asked Sep 24 '12 at 03:10

Evans Y.

4,209
6
35
43

3

votes

1 answer

How can I use Mahout's sequencefile API code?

There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i -o . I want use this command as code API.

hadoop mahout sequencefile

asked Jul 25 '12 at 08:06

Arash Hosseinabady

31
1
6

Questions tagged [sequencefile]