Questions tagged [hadoop-lzo]

Hadoop-LZO is a project to bring splittable LZO compression to Hadoop.

Source

Hadoop-LZO is a project to bring splittable LZO compression to Hadoop. LZO is an ideal compression format for Hadoop due to its combination of speed and compression size. However, LZO files are not natively splittable, meaning the parallelism that is the core of Hadoop is gone. This project re-enables that parallelism with LZO compressed files, and also comes with standard utilities (input/output streams, etc) for working with LZO files.

21 questions
14
votes
4 answers

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

I have been working on this problem for two days and still have not find the way. Problem: Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera…
caesar0301
  • 1,913
  • 2
  • 22
  • 24
4
votes
1 answer

Read uncompressed thrift files in spark

I'm trying to get spark to read uncompressed thrift files from s3. So far it has not been working. data is loaded in s3 as uncompressed thrift files. The source is AWS Kinesis Firehose. I have a tool that deserializes files with no problem, so I…
Martin Klosi
  • 3,098
  • 4
  • 32
  • 39
3
votes
1 answer

Trying to use LZO Compression with MapReduce

I want to use LZO compression in MapReduce, but am getting an error when I run my MapReduce job. I am using Ubuntu with a Java program. I am only trying to run this on my local machine. My initial error is ERROR lzo.GPLNativeCodeLoader: Could not…
Matt Cremeens
  • 4,951
  • 7
  • 38
  • 67
2
votes
1 answer

How does file compression format affect my spark processing

I am confused in understanding the splittable and non splittable file format in big data world . I was using zip file format and i understood that zip file are non splittable in a way that when i processed that file i had to use ZipFileInputFormat…
user9175539
2
votes
1 answer

Why does my LZO indexing take so long on Amazon's EMR when reading from S3?

I have a 30gb lzo file on S3, and I'm using hadoop-lzo to index it with Amazon EMR (AMI v2.4.2), using region us-east1. elastic-mapreduce --create --enable-debugging \ --ami-version "latest" \ --log-uri s3n://mybucket/mylogs \ --name…
Dolan Antenucci
  • 15,432
  • 17
  • 74
  • 100
1
vote
0 answers

Prepraing lzo or lz4 files for Spark

I'm trying to choose the right format for file exchange with my spark application. I use Spark 2.4.7 + Haddop 2.10 on Kubernetess. My app downloads CSV file from S3 and process it. The file is provided by a 3rd party company. I was thinking about…
Matzz
  • 670
  • 1
  • 7
  • 17
1
vote
1 answer

native-lzo not available error | Windows 10 | Java

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at…
1
vote
1 answer

Compression codec com.hadoop.compression.lzo.LzoCodec was not found

Trying to run a mapreduce job with compression hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ randomtextwriter \ -Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true…
1
vote
1 answer

Java Hadoop-lzo Found interface but class was expected LzoTextInputFormat

I'm trying to use the Hadoop-LZO package (built using the steps here). Seems like everything worked successfully as I was able to convert my lzo files to indexed files via (this returns big_file.lzo.index as expected): hadoop jar…
Sal
  • 1,653
  • 6
  • 23
  • 36
1
vote
0 answers

Reading Avro container files in Spark

I am working on a scenario where I need to read Avro container files from HDFS and do analysis using Spark. Input Files Directory: hdfs:///user/learner/20151223/.lzo* Note : The Input Avro Files are lzo compressed. val df =…
Govind
  • 419
  • 8
  • 25
1
vote
1 answer

lzo codec difference b/w python and java

I am running into a strange problem failing to inflate/uncompress lzo compressed data in java which was deflated/compressed from python lzo module although both seem to be using the same native lzo codec implementation. To give more details, I am…
user352951
  • 271
  • 1
  • 5
  • 11
0
votes
1 answer

How to decompress LZO file using java (using library lzo-core)

I am getting the issue while trying to decompress the LZO file using java. Below is the code and error I have pasted, can someone please help me on this import org.anarres.lzo.*; import java.io.*; public class…
Pritam007
  • 31
  • 4
0
votes
0 answers

Hive cannot find LZO codec

Error occurred when execute select * from xxx: Failed with exception java.io.IOException:java.io.IOException: No LZO codec found, cannot run. Troubleshooting done: Checked hadoop-lzo.jar located in $HADOOP_HOME/share/hadoop/common for all hadoop…
Steven
  • 21
  • 4
0
votes
1 answer

Reading LZO file of json lines in Spark DataFrames

I have a large indexed lzo file in HDFS that I would like to read in spark dataframes. The file contains lines of json documents. posts_dir='/data/2016/01' posts_dir has the following: /data/2016/01/posts.lzo /data/2016/01/posts.lzo.index The…
Majid Alfifi
  • 568
  • 2
  • 5
  • 18
0
votes
1 answer

Hadoop lzo single split after index

I have a LZO compressed file /data/mydata.lzo and want to run this though some MapReduce code I have. I first create an index file using the hadoop-lzo package with the following command: >> hadoop jar hadoop-lzo-0.4.21.jar \ …
Sal
  • 1,653
  • 6
  • 23
  • 36
1
2