175

I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it can be used by GZIP muddle.

How can I do this? The file inside the GZ file is an XML file.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Darkdeamon
  • 1,889
  • 2
  • 11
  • 8

10 Answers10

265
import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
    with open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
Erick
  • 1,408
  • 14
  • 19
Matt
  • 2,783
  • 2
  • 11
  • 7
  • 4
    why did you put a second with? is that common practice? you can open several files with the same context manager – RomainL. May 27 '19 at 15:06
  • 2
    Probably because you read f_in and write f_out. According to the docs you need params for read obj and a write obj, https://docs.python.org/3/library/shutil.html#shutil.copyfileobj. – paxton91michael Jun 21 '19 at 19:00
  • @Matt Shouldnt u also close f_in and f_out ? – JeyJ Sep 02 '19 at 21:36
  • 10
    @JeyJ : this is the purpose of the 'with' statement. It executes f_in.close() at the exist of the "with" section. Really useful if something is going wrong (like an exception). It makes sure that the resource is closed – sweetdream Sep 04 '19 at 06:27
  • 3
    Note that [`shutil.copyfileobj()`](https://docs.python.org/3/library/shutil.html#shutil.copyfileobj) has a third parameter `length`: *"The integer length, if given, is the buffer size. In particular, a negative length value means to copy the data without looping over the source data in chunks; by default the data is read in chunks to avoid uncontrolled memory consumption."* – norok2 Jun 17 '20 at 10:03
  • 1
    How can I simply extract the file in the .gz? In your example, it is assumed that you know the filename within the gz file. I just want to simply unzip the gz file and save all the files within gz with its original names and saved under the current folder. – XYZ Feb 11 '21 at 09:13
  • 2
    @Yu Xiang, gzip can only hold one file at the time, that's why it's often use with a tar archive. If you have a .tar.gz or a .tgz file, you should take a look at the tar module, not the gzip module. – mrBen Feb 18 '21 at 17:50
  • This one should be, by far, the accepted answer. – ncarrier Jun 04 '21 at 06:45
62

From the documentation:

import gzip
with gzip.open('file.txt.gz', 'rb') as f:
    file_content = f.read()
heinst
  • 8,520
  • 7
  • 41
  • 77
  • this solution works for me on python 2.7 without importing any library, @heinst thanks a lot – Farzad Farazmand Oct 09 '19 at 14:58
  • 3
    Just to note that this could blow up your memory as it will decompress everything into RAM at once. If you just want to decompress the file without loading it, the accepted answer by Matt is best. – Michael May 01 '21 at 10:51
  • 1
    @Michael This was taken straight from the Python docs at the time, so tell them that – heinst May 03 '21 at 17:26
26

Maybe you want pass it to pandas also.

with gzip.open('features_train.csv.gz') as f:

    features_train = pd.read_csv(f)

features_train.head()
bfontaine
  • 18,169
  • 13
  • 73
  • 107
Feiyang.Chen
  • 465
  • 4
  • 2
  • 15
    What has this got to do with Pandas? "***The file inside the GZ file is an XML file***" -- OP – c z Apr 15 '20 at 13:51
  • 4
    This is a very helpful answer. Users might land on this page from a search engine, and pandas handles xml files quite well actually. – Wtower Feb 25 '22 at 13:53
10
from sh import gunzip

gunzip('/tmp/file1.gz')
bfontaine
  • 18,169
  • 13
  • 73
  • 107
perfecto25
  • 772
  • 9
  • 13
8

Not an exact answer because you're using xml data and there is currently no pd.read_xml() function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!

import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()
whs2k
  • 741
  • 2
  • 10
  • 19
  • 4
    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Nic3500 Aug 07 '18 at 00:26
  • 1
    great answer. It simply reads a compressed json in a very simple (pythonic) way. – lordcenzin Dec 03 '19 at 20:35
7

If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.

import gzip
with gzip.open(file.gz, 'rb') as f:
    for line in f:
        print(line.decode().strip())
Pedro J. Sola
  • 91
  • 1
  • 4
4

It is very simple.. Here you go !!

import gzip

#path_to_file_to_be_extracted

ip = sample.gzip

#output file to be filled

op = open("output_file","w") 

with gzip.open(ip,"rb") as ip_byte:
    op.write(ip_byte.read().decode("utf-8")
    wf.close()
4

You can use gzip.decompress() to do it:

  1. read input file using rb mode;
  2. open output file using w mode and utf8 encoding;
  3. gzip.decompress() input bytes;
  4. decode what you get to str.
  5. write str to output file.
def decompress(infile, tofile):
    with open(infile, 'rb') as inf, open(tofile, 'w', encoding='utf8') as tof:
        decom_str = gzip.decompress(inf.read()).decode('utf-8')
        tof.write(decom_str)
secsilm
  • 381
  • 5
  • 16
1

If you have the gzip (and gunzip) programs installed on your computer a simple way is to call that command from python:

import os
filename = 'file.txt.gz'
os.system('gunzip ' + filename)

optionally, if you want to preserve the original file, use

os.system('gunzip --keep ' + filename)
mgb
  • 64
  • 3
  • On older systems you might have to use gunzip -c file.txt.gz > file.txt so the command would be: os.system('gunzip -c ' + filename + ' > ' + filename[:-3] – mgb May 12 '21 at 17:50
  • os.system("gunzip path/to/filename") giving error | sh: gunzip: command not found but from commandline i can use gunzip any clue why this is happening – JustTry Sep 06 '22 at 20:58
  • It's possible your python distribution uses a different shell (or path settings) than your command line. Find the full path to the gunzip application ("which gunzip" on -nix systems) then enter it like os.system('/opt/local/bin/gunzip ' + filename) – mgb Sep 17 '22 at 15:51
-5

if you have a linux environment it is very easy to unzip using the command gunzip. go to the file folder and give as below

gunzip file-name 
Josef
  • 2,869
  • 2
  • 22
  • 23
arunchiri
  • 33
  • 3