Total file size : 1 GB
Block size : 128 MB
Number of splits: 8
Creating a split for each block won’t work since it is impossible to start reading at an arbitrary point in the gzip stream and therefore impossible for a map task to read its split independently of the others. The gzip format uses DEFLATE to store the compressed data, and DEFLATE stores data as a series of compressed blocks. The problem is that the start of each block is not distinguished in any way. For this reason, gzip does not support splitting.
MapReduce will does not split the gzipped file, since it knows that the input is gzip-compressed (by looking at the filename extension) and that gzip does not support splitting. This will work, but at the expense of locality: a single map will process the 8 HDFS blocks, most of which will not be local to the map.
Have a look at : this article and section name: Issues about compression and input split
EDIT: ( for splittable uncompression)
BZip2 is a compression / De-Compression algorithm which does compression on blocks of data and later these compressed blocks can be decompressed independent of each other. This is indeed an opportunity that instead of one BZip2 compressed file going to one mapper, we can process chunks of file in parallel. The correctness criteria of such a processing is that for a bzip2 compressed file, each compressed block should be processed by only one mapper and ultimately all the blocks of the file should be processed. (By processing we mean the actual utilization of that un-compressed data (coming out of the codecs) in a mapper)
Source: https://issues.apache.org/jira/browse/HADOOP-4012