Questions tagged [bz2]

For issues relating to bz2 which is the file extension of compressed files by bzip2.

Files compressed with bzip2 are frequently given the bz2 extension. bunzip2 should be used to decompress these files.

tar supports bzip2 with the -j option, which can be used to extract or create archives that are also compressed with bzip2.

Also see tag bzip2

106 questions
11
votes
4 answers

Python decompression relative performance?

TLDR; Of the various compression algorithms available in python gzip, bz2, lzma, etc, which has the best decompression performance? Full discussion: Python 3 has various modules for compressing/decompressing data including gzip, bz2 and lzma. gzip…
ibrewster
  • 3,482
  • 5
  • 42
  • 54
8
votes
4 answers

Reading first lines of bz2 files in python

I am trying to extract 10'000 first lines from a bz2 file. import bz2 file = "file.bz2" file_10000 = "file.txt" output_file = codecs.open(file_10000,'w+','utf-8') source_file = bz2.open(file, "r") count = 0 for line in…
student
  • 511
  • 1
  • 5
  • 20
7
votes
2 answers

Spark: difference when read in .gz and .bz2

I normally read and write files in Spark using .gz, which the number of files should be the same as the number of RDD partitions. I.e. one giant .gz file will read in to a single partition. However, if I read in one single .bz2, would I still get…
Edamame
  • 23,718
  • 73
  • 186
  • 320
6
votes
1 answer

How can i extract bz2 file in Java on Android?

How can i extract bz2 file in Java on Android? Are any included libraries in android?
bordeux
  • 612
  • 1
  • 8
  • 23
6
votes
1 answer

List all files in a .tar.bz2, sorted by size

I use this command to list all files in an archive: tar jtvf blah.tar.bz2 How to list them sorted by size? Or list only the biggest files (i.e. files bigger than, say, 10MB)?
Basj
  • 41,386
  • 99
  • 383
  • 673
5
votes
2 answers

How to read lines from arbitrary BZ2 streams for CSV?

The bz2 module provides a standard open() method from which one can call readline(). However, my situation is one where I have a stream (pointing to a large amount of data) that I want to decompress lines from on the fly. My current implementation…
Neil C. Obremski
  • 18,696
  • 24
  • 83
  • 112
4
votes
2 answers

Python: Convert Raw String to Bytes String without adding escape chraracters

I have a string: 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' And I want: b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' But I…
Bryan Yao
  • 65
  • 2
  • 7
4
votes
1 answer

Boost 1.59 not decompressing all bzip2 streams

I've been trying to decompress some .bz2 files on the fly and line-by-line so to speak as the files I'm dealing with are massive uncompressed (region of 100 GB uncompressed) so I wanted to add a solution that saves disk space. I have no problems…
Primalfido
  • 53
  • 4
3
votes
0 answers

too many values to unpack (expected 2) in jupyter

I use compressed pickle to save the results from sklearn gridsearch using the following code. import pickle import bz2 from sklearn.model_selection import RandomizedSearchCV search = RandomizedSearchCV(estimator, param_distributions=param_dist, …
kaidi
  • 31
  • 2
3
votes
3 answers

How to parse WIkidata JSON (.bz2) file using Python?

I want to look at entities and relationships using Wikidata. I downloaded the Wikidata JSON dump (from here .bz2 file, size ~ 18 GB). However, I cannot open the file, it's just too big for my computer. Is there a way to look into the file without…
pajamas
  • 1,194
  • 1
  • 12
  • 25
3
votes
0 answers

How spark handle hdfs compressed file and how to choose hdfs compression codec(splittable or not splittable)

Background: We have one project that use spark process some log/csv files, each file size is very large, for example 20GB. So we need to compress the log/csv file Example HDFS block size: 128M, and we have a 1GB log file. if file is not compressed,…
pxchen
  • 51
  • 1
  • 4
3
votes
1 answer

How to capture tcpdump to a compress file in linux

I have a DNS server and I want to capture DNS traffic to get all the IPs which use my DNS server. For this I start using following tcpdump command and capture them to a file: tcpdump -n -i eth0 dst port 53 >> dns_data.log But the file size is high…
Yasiru G
  • 6,886
  • 6
  • 23
  • 43
2
votes
1 answer

Python 3 bz2 huge file and progress

I'm implementing a tool that parses a huge set of 248GB files compressed in bz2 format. The average compression factor is 0.04, so it's quite out of question decompressing them to over 6 terabytes beforehand. Each line of the content files is a…
2
votes
0 answers

How to split json.bz2 file randomly without iterating the file in python?

I have a json.bz2 file over 50GB size. I would like to split file into partitions to run process in multithreads using python. Could you please suggest me an ideal way to split json.bz2 file randomly (without reading / iterating) using python…
rk_acumen
  • 21
  • 2
2
votes
1 answer

bz2 module fails when building Python 3.7

I'm trying to cross compile Python 3.7 for Android. I see in my output that bz2 if failing with the following error building '_bz2' extension /home/dematic/SPE/python3-android/sdk/android-ndk-r19c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang…
Brian S
  • 3,096
  • 37
  • 55
1
2 3 4 5 6 7 8