Questions tagged [lzo]

LZO is a (text) compression algorithm from the Lempel-Ziv family, which favours speed against compression ratio.

LZO is a data compression library which is suitable for data de-/compression in real-time. This means it favours speed over compression ratio.

LZO is written in ANSI C. Both the source code and the compressed data format are designed to be portable across platforms.

LZO implements a number of algorithms with the following features:

Decompression is simple and very fast. Requires no memory for decompression. Compression is pretty fast. Requires 64 kB of memory for compression. Allows you to dial up extra compression at a speed cost in the compressor. The speed of the decompressor is not reduced. Includes compression levels for generating pre-compressed data which achieve a quite competitive compression ratio. There is also a compression level which needs only 8 kB for compression. Algorithm is thread safe. Algorithm is lossless. LZO supports overlapping compression and in-place decompression.

LZO and the LZO algorithms and implementations are distributed under the terms of the GNU General Public License (GPL) .

122 questions
40
votes
5 answers

Spark SQL - difference between gzip vs snappy vs lzo compression formats

I am trying to use Spark SQL to write parquet file. By default Spark SQL supports gzip, but it also supports other compression formats like snappy and lzo. What is the difference between these compression formats?
Shankar
  • 8,529
  • 26
  • 90
  • 159
8
votes
4 answers

What're lzo and lzf, and the differences?

Hi I heard of lzo and lzf and seems they are all compression algorithms. Are they the same thing? Are there any other algorithms like them(light and fast)?
Mickey Shine
  • 12,187
  • 25
  • 96
  • 148
8
votes
2 answers

Decompressing a .lzo file using shell script

Ok so i did a fair bit of search on the web and did not find any answers. I am writing a shell script wherein I need to decompress a .lzo file. Do not see any leads. Anyone has any idea? I am basically reading a timestamped log file. My scripts…
Vikas
  • 704
  • 1
  • 6
  • 13
7
votes
3 answers

How to decompress lzo_deflate file?

I used LZO to compress reduce output. I tried this: Hadoop-LZO project of Kevin Weil and then used LzoCodec class with my job: TextOutputFormat.setOutputCompressorClass(job, LzoCodec.class); Now compression works just fine. My problem is that…
Nadjib Mami
  • 5,736
  • 9
  • 37
  • 49
6
votes
3 answers

Reverse Engineering: How do I identify an unknown compression method?

I'm with a group of modders attempting to reverse engineer and mod a Blu-Ray player. We're stuck because the firmware code seems to be compressed, and the decompression code is nowhere to be found. Presumably, the decompression is handled by…
tank
6
votes
1 answer

How to Get Pig to Work with lzo Files?

So, I've seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn't seem to specify whether you're trying to get things to work on a remote cluster, or to locally interact with a remote…
Eli
  • 36,793
  • 40
  • 144
  • 207
6
votes
1 answer

Efficiently Storing the data in Hive

How can I efficiently store data in Hive and also store and retrieve compressed data in hive? Currently I am storing it as a TextFile. I was going through Bejoy article and I found that LZO compression will be good for storing the files and also it…
arsenal
  • 23,366
  • 85
  • 225
  • 331
5
votes
3 answers

Spark/Hadoop throws exception for large LZO files

I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.: ... s3://mylogfiles/2014-08-11-00111.lzo s3://mylogfiles/2014-08-11-00112.lzo ... In the spark-shell I'm running…
4
votes
1 answer

Decompressing an LZO stream in PHP

I have a number of LZO-compressed log files on Amazon S3, which I want to read from PHP. The AWS SDK provides a nice StreamWrapper for reading these files efficiently, but since the files are compressed, I need to decompress the content before I can…
Jens Roland
  • 27,450
  • 14
  • 82
  • 104
4
votes
1 answer

Hadoop-LZO strange native-lzo library not available error

I've installed the Cloudera Hadoop-LZO package and added the following settings into my client environment safety…
Carl Sagan
  • 982
  • 1
  • 13
  • 34
4
votes
2 answers

Cloudera Manager: Where do I put Java ClassPath for MapReduce jobs?

I've got Hadoop-Lzo working happily on my local pseudo-cluster but the second I try the same jar file in production, I get: java.lang.RuntimeException: native-lzo library not available The libraries are verified to be on the DataNodes, so my…
Carl Sagan
  • 982
  • 1
  • 13
  • 34
4
votes
4 answers

native-lzo library not available on Hadoop datanodes

I've written a simple LzoWordCount the following to my…
Carl Sagan
  • 982
  • 1
  • 13
  • 34
4
votes
3 answers

Open an lzo file in python, without decompressing the file

I'm currently working on a 3rd year project involving data from Twitter. The department have provided me with .lzo's of a months worth of Twitter. The smallest is 4.9gb and when decompressed is 29gb so I'm trying to open the file and read as I'm…
DrugCrazed
  • 269
  • 1
  • 5
  • 14
4
votes
1 answer

When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?

I want to use LZO compression on my Elastic Map Reduce job's output that is being stored on S3, but it is not clear if the files are automatically indexed so that future jobs run on this data will split the files into multiple tasks. For example,…
Dolan Antenucci
  • 15,432
  • 17
  • 74
  • 100
4
votes
1 answer

Snappy or LZO for logs then consumed by hadoop

I have a high volume service. I log events. Every few minutes, I zip the logs using gzip and rotate them to S3. From there, we process the logs using Amazon's Hadoop -- elastic mapreduce -- via Hive. Right now on the servers, we get a CPU spike…
John Hinnegan
  • 5,864
  • 2
  • 48
  • 64
1
2 3
8 9