Questions tagged [bz2]

For issues relating to bz2 which is the file extension of compressed files by bzip2.

Files compressed with bzip2 are frequently given the bz2 extension. bunzip2 should be used to decompress these files.

tar supports bzip2 with the -j option, which can be used to extract or create archives that are also compressed with bzip2.

Also see tag bzip2

106 questions

votes

4 answers

Python decompression relative performance?

TLDR; Of the various compression algorithms available in python gzip, bz2, lzma, etc, which has the best decompression performance? Full discussion: Python 3 has various modules for compressing/decompressing data including gzip, bz2 and lzma. gzip…

asked Jun 21 '19 at 18:24

ibrewster

3,482
5
42
54

votes

4 answers

Reading first lines of bz2 files in python

I am trying to extract 10'000 first lines from a bz2 file. import bz2 file = "file.bz2" file_10000 = "file.txt" output_file = codecs.open(file_10000,'w+','utf-8') source_file = bz2.open(file, "r") count = 0 for line in…

python bz2

asked May 11 '16 at 20:28

student

votes

2 answers

Spark: difference when read in .gz and .bz2

I normally read and write files in Spark using .gz, which the number of files should be the same as the number of RDD partitions. I.e. one giant .gz file will read in to a single partition. However, if I read in one single .bz2, would I still get…

apache-spark rdd gzip bz2

asked May 25 '16 at 18:32

Edamame

23,718
73
186
320

votes

1 answer

How can i extract bz2 file in Java on Android?

How can i extract bz2 file in Java on Android? Are any included libraries in android?

java android bz2

asked Mar 31 '12 at 21:29

bordeux

votes

1 answer

List all files in a .tar.bz2, sorted by size

I use this command to list all files in an archive: tar jtvf blah.tar.bz2 How to list them sorted by size? Or list only the biggest files (i.e. files bigger than, say, 10MB)?

sorting archive tar bz2

asked Sep 21 '16 at 10:58

Basj

41,386
99
383
673

votes

2 answers

How to read lines from arbitrary BZ2 streams for CSV?

The bz2 module provides a standard open() method from which one can call readline(). However, my situation is one where I have a stream (pointing to a large amount of data) that I want to decompress lines from on the fly. My current implementation…

python python-2.7 csv bz2

asked Dec 12 '17 at 17:45

Neil C. Obremski

18,696
24
83
112

votes

2 answers

Python: Convert Raw String to Bytes String without adding escape chraracters

I have a string: 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' And I want: b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' But I…

python python-3.x lxml lxml.html bz2

asked Jul 21 '18 at 15:20

Bryan Yao

votes

1 answer

Boost 1.59 not decompressing all bzip2 streams

I've been trying to decompress some .bz2 files on the fly and line-by-line so to speak as the files I'm dealing with are massive uncompressed (region of 100 GB uncompressed) so I wanted to add a solution that saves disk space. I have no problems…

c++ boost bzip2 bz2

asked Sep 30 '15 at 16:14

Primalfido

votes

0 answers

too many values to unpack (expected 2) in jupyter

I use compressed pickle to save the results from sklearn gridsearch using the following code. import pickle import bz2 from sklearn.model_selection import RandomizedSearchCV search = RandomizedSearchCV(estimator, param_distributions=param_dist, …

python pickle jupyter-lab bz2

asked Jun 27 '21 at 00:32

kaidi

votes

3 answers

How to parse WIkidata JSON (.bz2) file using Python?

I want to look at entities and relationships using Wikidata. I downloaded the Wikidata JSON dump (from here .bz2 file, size ~ 18 GB). However, I cannot open the file, it's just too big for my computer. Is there a way to look into the file without…

python json wikidata bz2

asked Jan 03 '18 at 13:39

pajamas

1,194
1
12
25

votes

0 answers

How spark handle hdfs compressed file and how to choose hdfs compression codec(splittable or not splittable)

Background: We have one project that use spark process some log/csv files, each file size is very large, for example 20GB. So we need to compress the log/csv file Example HDFS block size: 128M, and we have a 1GB log file. if file is not compressed,…

hadoop apache-spark split compression bz2

asked Oct 18 '17 at 08:30

pxchen

votes

1 answer

How to capture tcpdump to a compress file in linux

I have a DNS server and I want to capture DNS traffic to get all the IPs which use my DNS server. For this I start using following tcpdump command and capture them to a file: tcpdump -n -i eth0 dst port 53 >> dns_data.log But the file size is high…

linux compression tcpdump bzip2 bz2

asked May 19 '16 at 04:09

Yasiru G

6,886
6
23
43

votes

1 answer

Python 3 bz2 huge file and progress

I'm implementing a tool that parses a huge set of 248GB files compressed in bz2 format. The average compression factor is 0.04, so it's quite out of question decompressing them to over 6 terabytes beforehand. Each line of the content files is a…

python python-3.x bz2

asked Jun 09 '21 at 18:27

Fernando D'Andrea

votes

0 answers

How to split json.bz2 file randomly without iterating the file in python?

I have a json.bz2 file over 50GB size. I would like to split file into partitions to run process in multithreads using python. Could you please suggest me an ideal way to split json.bz2 file randomly (without reading / iterating) using python…

python json split partitioning bz2

asked Sep 23 '20 at 06:27

rk_acumen

votes

1 answer

bz2 module fails when building Python 3.7

I'm trying to cross compile Python 3.7 for Android. I see in my output that bz2 if failing with the following error building '_bz2' extension /home/dematic/SPE/python3-android/sdk/android-ndk-r19c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang…

android python python-3.x android-ndk bz2

asked May 28 '19 at 21:44

Brian S

3,096
37
55

2 3 4 5 6 7 8 Next