2

I have many (like 1000) .bz2 files with each (200-50Mb) containing 4 .txt(.dat) files inside , how can I read some specific information from .txt(dat)s without decompressing them? I am only a beginner python 3 user,so please give me some hits or maybe useful examples. Thank you.

I made code which actually unzip .txt(s) in temp folder but it takes like 40sec to proceed 170Mb tar...only one...whereas I have thousands.

import bz2
import os
import tempfile
import shutil

pa = '/home/user/tar' #.tar(s) location
fds = sorted(os.listdir(pa))
i = 0
for bz in fds:
    path = os.path.join(pa, tar)
    i +=1
    archive = bz2.BZ2File(path, 'r')
    tmpdir = tempfile.mkdtemp(dir=os.getcwd())
    bz2.decompress('example.txt', path=tmpdir)
    path_to_my_file = os.path.join(tmpdir, 'example.txt')
    here goes some simple manupulation with my .txt (like print smthg)
    shutil.rmtree(tmpdir)
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • I imagine it would be much easier to use the `tar` command to write the uncompressed files to stdout and redirect that input into your python program, so your python code doesn't have to reinvent `tar`. – John Gordon Sep 30 '18 at 04:48
  • Ive added my code – Kirill Ustinov Sep 30 '18 at 09:24
  • Possible duplicate of [reading tar file contents without untarring it, in python script](https://stackoverflow.com/questions/2018512/reading-tar-file-contents-without-untarring-it-in-python-script) – ForceBru Sep 30 '18 at 12:10

0 Answers0