Questions tagged [lzo]

LZO is a (text) compression algorithm from the Lempel-Ziv family, which favours speed against compression ratio.

LZO is a data compression library which is suitable for data de-/compression in real-time. This means it favours speed over compression ratio.

LZO is written in ANSI C. Both the source code and the compressed data format are designed to be portable across platforms.

LZO implements a number of algorithms with the following features:

Decompression is simple and very fast. Requires no memory for decompression. Compression is pretty fast. Requires 64 kB of memory for compression. Allows you to dial up extra compression at a speed cost in the compressor. The speed of the decompressor is not reduced. Includes compression levels for generating pre-compressed data which achieve a quite competitive compression ratio. There is also a compression level which needs only 8 kB for compression. Algorithm is thread safe. Algorithm is lossless. LZO supports overlapping compression and in-place decompression.

LZO and the LZO algorithms and implementations are distributed under the terms of the GNU General Public License (GPL) .

122 questions

votes

5 answers

Spark SQL - difference between gzip vs snappy vs lzo compression formats

I am trying to use Spark SQL to write parquet file. By default Spark SQL supports gzip, but it also supports other compression formats like snappy and lzo. What is the difference between these compression formats?

asked Mar 04 '16 at 06:28

Shankar

8,529
26
90
159

votes

4 answers

What're lzo and lzf, and the differences?

Hi I heard of lzo and lzf and seems they are all compression algorithms. Are they the same thing? Are there any other algorithms like them(light and fast)?

algorithm compression lzo lzf

asked Feb 23 '11 at 09:35

Mickey Shine

12,187
25
96
148

votes

2 answers

Decompressing a .lzo file using shell script

Ok so i did a fair bit of search on the web and did not find any answers. I am writing a shell script wherein I need to decompress a .lzo file. Do not see any leads. Anyone has any idea? I am basically reading a timestamped log file. My scripts…

bash shell unix compression lzo

asked Jun 06 '13 at 13:41

Vikas

votes

3 answers

How to decompress lzo_deflate file?

I used LZO to compress reduce output. I tried this: Hadoop-LZO project of Kevin Weil and then used LzoCodec class with my job: TextOutputFormat.setOutputCompressorClass(job, LzoCodec.class); Now compression works just fine. My problem is that…

java unix hadoop compression lzo

asked May 21 '13 at 18:27

Nadjib Mami

5,736
9
37
49

votes

3 answers

Reverse Engineering: How do I identify an unknown compression method?

I'm with a group of modders attempting to reverse engineer and mod a Blu-Ray player. We're stuck because the firmware code seems to be compressed, and the decompression code is nowhere to be found. Presumably, the decompression is handled by…

compression reverse-engineering lzma lzo lzh

asked Apr 27 '09 at 21:11

tank

votes

1 answer

How to Get Pig to Work with lzo Files?

So, I've seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn't seem to specify whether you're trying to get things to work on a remote cluster, or to locally interact with a remote…

hadoop apache-pig lzo

asked Sep 01 '11 at 23:07

Eli

36,793
40
144
207

votes

1 answer

Efficiently Storing the data in Hive

How can I efficiently store data in Hive and also store and retrieve compressed data in hive? Currently I am storing it as a TextFile. I was going through Bejoy article and I found that LZO compression will be good for storing the files and also it…

hadoop mapreduce hive lzo

asked Aug 01 '12 at 19:43

arsenal

23,366
85
225
331

votes

3 answers

Spark/Hadoop throws exception for large LZO files

I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.: ... s3://mylogfiles/2014-08-11-00111.lzo s3://mylogfiles/2014-08-11-00112.lzo ... In the spark-shell I'm running…

hadoop apache-spark elastic-map-reduce lzo

asked Aug 11 '14 at 16:37

Pimin Konstantin Kefaloukos

1,560
3
14
31

votes

1 answer

Decompressing an LZO stream in PHP

I have a number of LZO-compressed log files on Amazon S3, which I want to read from PHP. The AWS SDK provides a nice StreamWrapper for reading these files efficiently, but since the files are compressed, I need to decompress the content before I can…

php amazon-web-services stream bigdata lzo

asked Dec 13 '13 at 14:29

Jens Roland

27,450
14
82
104

votes

1 answer

Hadoop-LZO strange native-lzo library not available error

I've installed the Cloudera Hadoop-LZO package and added the following settings into my client environment safety…

java hadoop lzo

asked Aug 06 '13 at 00:05

Carl Sagan

votes

2 answers

Cloudera Manager: Where do I put Java ClassPath for MapReduce jobs?

I've got Hadoop-Lzo working happily on my local pseudo-cluster but the second I try the same jar file in production, I get: java.lang.RuntimeException: native-lzo library not available The libraries are verified to be on the DataNodes, so my…

hadoop cloudera lzo

asked Aug 05 '13 at 19:33

Carl Sagan

votes

4 answers

native-lzo library not available on Hadoop datanodes

I've written a simple LzoWordCount the following to my…

hadoop lzo

asked Aug 05 '13 at 18:55

Carl Sagan

votes

3 answers

Open an lzo file in python, without decompressing the file

I'm currently working on a 3rd year project involving data from Twitter. The department have provided me with .lzo's of a months worth of Twitter. The smallest is 4.9gb and when decompressed is 29gb so I'm trying to open the file and read as I'm…

python lzo

asked Nov 16 '12 at 11:46

DrugCrazed

votes

1 answer

When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?

I want to use LZO compression on my Elastic Map Reduce job's output that is being stored on S3, but it is not clear if the files are automatically indexed so that future jobs run on this data will split the files into multiple tasks. For example,…

amazon-s3 amazon-web-services elastic-map-reduce lzo

asked Oct 22 '12 at 21:13

Dolan Antenucci

15,432
17
74
100

votes

1 answer

Snappy or LZO for logs then consumed by hadoop

I have a high volume service. I log events. Every few minutes, I zip the logs using gzip and rotate them to S3. From there, we process the logs using Amazon's Hadoop -- elastic mapreduce -- via Hive. Right now on the servers, we get a CPU spike…

hadoop hive lzo snappy

asked Sep 26 '12 at 03:11

John Hinnegan

5,864
2
48
64

2 3

…

8 9 Next