-1

Extract .txt files from a zipped folder with multiple subfolders and rename these extracted files based on their file location.

Step 1: Zipped File Extraction

import os, zipfile,glob,shutil

INSERT Folder name where zipped folders are available

dir_name = r'C:\Users\user1\Documents\Extract .txt files\ZIPPED\\'
extension = '.zip'
os.chdir(dir_name) #change directory to point to this location

Check for .zip files, unzip and extract all files in "All Extracted Files" folder

for item in os.listdir(dir_name): #loop through items in this dir
    if item.endswith(extension): #check for ".zip" extension
        file_name = os.path.abspath(item) #get full path of files
        zip_ref = zipfile.ZipFile(file_name) #create zip_ref as zipfile object
        zip_ref.extractall('All Extracted Files') #extract file to dir
        zip_ref.close() #close file

FEED Folder name where .txt files should be saved

newpath = r'C:\Users\user1\Documents\Extract .txt files\ZIPPED\TXT Files\\'
if not os.path.exists(newpath): #if folder doesn't exist then create one
    os.makedirs(newpath)

Step 2: COPY only .txt files to "TXT Files" folder*

srcdir = r'C:\Users\user1\Documents\Extract .txt files\ZIPPED\All Extracted Files\\'
dstdir = r'C:\Users\user1\Documents\Extract .txt files\ZIPPED\TXT Files\\'
for root, dirs, files in os.walk(srcdir):
    for file in files:
        if file[-4:].lower() == '.txt':
            shutil.copy(os.path.join(root, file), os.path.join(dstdir, file))

I am stuck at how to rename these extracted files based on their location/ folder name. Ex: Rename doc1.txt to Foldername_doc1.txt.

Any ideas pls...

Sasha18
  • 65
  • 8
  • How does it work for files in subfolders of subfolders, etc? – martineau Dec 01 '20 at 07:12
  • os.walk() generates the file names in a directory tree by walking the tree either top-down or bottom-up, this will help to look for specific files in subfolders of subfolders – Sasha18 Dec 01 '20 at 09:05
  • I know how `os.walk()` works. What I meant was how do you want to rename the files in that case (where it's nested in two or more subfolders). – martineau Dec 01 '20 at 15:36
  • It should just consider the name of subfolder it is located in... – Sasha18 Dec 02 '20 at 10:07
  • Does this answer your question? [How do I get the parent directory in Python?](https://stackoverflow.com/questions/2860153/how-do-i-get-the-parent-directory-in-python) – Tomerikoo Dec 08 '20 at 10:34
  • Thanks @tomerikoo, yea it partially answers my query – Sasha18 Dec 11 '20 at 11:09

1 Answers1

1

I highly recommend doing this sort of thing using the pathlib module available in Python 3.4+ because it makes a lot of what needs to be done, path-manipulation-wise, easier and more readable than doing so with the functions in os.path — so I have converted your code to use it.

While the code below worked in the relatively simple testing I did, there's several things / limitations it has you need to be made aware of. One is that if it processes multiple .zip files, it's possible for the contents left over from processing an earlier one to interfere with the other.

Another is that just prefixing the file name with its containing subfolder name does not guarantee that it will be unique — it's possible that there are two subfolders with the same name in a folder hierarchy. i.e. /foo/bar/baz and /foo/doo/baz might both have a file with the same name in them, so just prefixing that file name with baz_ would not make it different from the other instance.

import glob
import os
from pathlib import Path
import shutil
import zipfile


# Where zipped folders are available.
dir_name = Path('C:/Users/user1/Documents/Extract .txt files/ZIPPED')
srcdir   = dir_name / 'All Extracted Files'
dstdir   = dir_name / 'TXT Files'
EXTENSION = '.zip'


# Check for .zip files, unzip and extract all files in "All Extracted Files" folder.
for item in os.listdir(dir_name):
    filename = (dir_name / item).resolve() # Get full file path.
    if filename.suffix.lower() == EXTENSION: # Zip file?
        with zipfile.ZipFile(filename) as zip_ref:
            zip_ref.extractall(srcdir) # extract files to dir

if not dstdir.exists(): # Create destination folder if it doesn't exist.
    os.makedirs(dstdir)

# Copy only .txt files to destination "TXT Files" folder.
for root, dirs, files in os.walk(srcdir):
    root = Path(root)
    for file in files:
        srcpath = root / file
        if srcpath.suffix.lower() == '.txt':  # Text file?
            dstname = root.name + '_' + file  # Prefix filename with source folder name.
            dstpath = dstdir / dstname
            shutil.copy(srcpath, dstpath)

print('Done')

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks sir `filename = (dir_name / item).resolve()` throws up "unexpected character error" can you help with this please... – Sasha18 Dec 08 '20 at 09:43