0

I have this python script which will take these three arguments:

  1. a given path for a directory with files to rename
  2. a CSV file with two columns to map the file names to:
original,new
barcode01,sample01
barcode02,sample02
  1. extension of the file (i.e. .txt, .bam, .png, .txt.readdb.log) which can be long.

The script:

import os
import csv

def rename_files(path, name_map, ext):
    with open(name_map, 'r') as csv_map:
        filereader = csv.DictReader(csv_map)
        for row in filereader:
            original_name = row["original"]
            new_name = row["new"]
            old_filename = '%s/%s.%s' % (path, original_name, ext)
            new_filename = '%s/%s_%s.%s' % (path, new_name, original_name, ext)
            try:
                os.rename(old_filename, new_filename)
            except Exception as e:
                print('Rename for file %s failed. Details: ' % old_filename) 
                print (e)

if __name__ == '__main__':
    filename, path, name_map, ext = sys.argv
    rename_files(path, name_map, ext) 

For example:

python rename.py /test/directory filestorename.csv txt

will only rename barcode01.txt to sample01.txt.

However, there are multiple barcode01 files with different extensions (i.e. barcode01.png). Instead of passing these extensions as arguments to the script, how can I modify this script to just rename all these files at once, keeping the extension the same?

J-6474
  • 21
  • 4

2 Answers2

1

Assuming all files exists, you may extract the base directory, basename and file extension as follows:

from csv import DictReader
from os import path, rename
from sys import exit

import argparse

def rename_file(row):
    origin = row['original']
    directory = path.dirname(origin)
    _, extension = path.splitext(path.basename(origin))
    target = path.join(directory, '{}{}'.format(row['new'], extension))
    return rename(origin, target)

call it inside a loop:

def rename_files(spreadsheet):
    csv = DictReader(open(spreadsheet))
    valid_rows = filter(lambda row: path.isfile(row['original']), csv)
    for row in valid_rows:
        rename_file(row)

You also may improve your main function:

def main():
    parser = argparse.ArgumentParser('rename files from *.csv')
    parser.add_argument(
        '-f', '--file',
        metavar='file',
        type=str,
        help='csv (comma-separated values) file'
    )

    args = parser.parse_args()

    if not path.isfile(args.file):
        print('No such file: {}'.format(args.file))
        return exit(1)

    return rename_files(args.file)


if __name__ == '__main__':
    main()
JP Ventura
  • 5,564
  • 6
  • 52
  • 69
  • 1
    Thank you so much! How do I modify this to handle longer extensions (i.e. sample01.alignreport.txt or sample01.pass.vcf.gz versus just sample01.txt). This works but ends up only including the end suffix of the extension. – J-6474 Sep 02 '21 at 17:19
  • This is not a _longer file extension_, but rather a compressed file. If you are working with Pandas, you should use `pd.read_csv` function with `compression` parameter. – JP Ventura Sep 06 '21 at 17:46
  • Are you renaming files without considering their actual encoding? Keep in mind that `*.csv` and `*.tsv` are just for loading datasets, while `*.vcf` for contacts. Files `*.bz`, `*.gz`, `*.rar`, and `*.zip` are just compressed files or folders. – JP Ventura Sep 06 '21 at 17:54
  • I want to rename the files while maintaining the extension in full regardless of the encoding. The files range from CSV/TSV, *.txt, to compressed BAM/SAM files. There are some `*.gz` with varying suffixes (i.e. "*.pass.vcf.gz", "merged.vcf.gz"). This current method will rename with the same encoding (i.e. *.gz) while removing the descriptive suffix (i.e. *.pass.vcf.gz). – J-6474 Sep 07 '21 at 15:10
0

I would use the pathlib library, which makes dealing with file name easier. Note that in pathlib, a file name without extension is called a stem.

#!/usr/bin/env python3
import csv
import pathlib
import sys


def rename_files(directory, name_map):
    directory = pathlib.Path(directory)

    with open(name_map) as stream:
        reader = csv.reader(stream)
        next(reader)  # Skip the header
        for old_name, new_name in reader:
            for old_path in directory.glob(old_name + ".*"):
                new_path = old_path.with_stem(new_name)
                old_path.rename(new_path)


if __name__ == '__main__':
    rename_files(sys.argv[1], sys.argv[2])

Example:

python rename.py /test/directory filestorename.csv

Notes

  • The key to rename files, regardless of extension, is to use the .glob() function to find all files with the same name, but with different extensions.
  • The .with_stem() function basically take the path and return another path with different stem (filename minus extension)
Hai Vu
  • 37,849
  • 11
  • 66
  • 93