0

Just for fun, I am trying to create a batch renaming application in Python 3.6.0 which is supposed to capture, split the file name based on regex, and properly name the files. For testing purposes, I am printing in output file until it works properly.

This is my code:

def batch_rename(self):
    if self._root is None:
        raise NotADirectoryError("self._root is empty")

    with open('output.txt', 'w') as self._open_file:
        for root, dirs, files in os.walk(self._root):
            for name in files:
                new_file = self._rename_file(root, name)
                self._add_size(root, name)
                self._open_file.write("\"{0}\" renamed to \"{1}\"\n".format(name, new_file))
                self._count += 1
            self._open_file.write("\n")

        self._open_file.write("Total files: {0}\n".format(self._count))
        self._open_file.write("Total size: {0}\n".format(self._get_total_size()))

def _rename_file(self, root_path, file_name):
    file_name = bytes(file_name, 'utf-8').decode('utf-8', 'ignore')
    # file_name = ''.join(x for x in file_name if x in string.printable)
    split_names = re.split(pattern=self._re, string=file_name)

    if len(split_names) > 1:
        new_file = self._prefix + ' ' + ''.join(split_names)
    else:
        new_file = self._prefix + ' ' + '' + split_names[0]

    new_file = new_file.replace('  ', ' ')

    return new_file

I'm running into encoding issues because of non-writable characters like:

  • russian letters (odd, I know)
  • symbols like hearts, clubs, spades, etc.

The error message I received is:

Traceback (most recent call last):
  File "C:/Users/thisUser/OneDrive/Projects/Examples.Python/BatchFileRenamer/BatchFileRename2.py", line 90, in <module>
    br.batch_rename()
  File "C:/Users/thisUser/OneDrive/Projects/Examples.Python/BatchFileRenamer/BatchFileRename2.py", line 34, in batch_rename
    self._open_file.write("\"{0}\" renamed to \"{1}\"\n".format(name, new_file))
  File "C:\Users\thisUser\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position 10: character maps to <undefined>

I tried looking through 3 SO questions / answers:

And I found no helpful answers.

Could anyone please help out? I'd greatly appreciate it :)

Sometowngeek
  • 597
  • 1
  • 6
  • 27
  • 1
    If you want to modify these Unicode filenames so they're just plain ASCII, take a look at [Unidecode](https://pypi.python.org/pypi/Unidecode). – PM 2Ring Aug 07 '17 at 18:52
  • https://wiki.python.org/moin/PrintFails –  Aug 07 '17 at 18:56
  • 1
    did you try opening the file with `codecs` instead of open? - `with codecs.open('output.txt', 'w', 'utf-8')...`, take a look here - https://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python – Dror Av. Aug 07 '17 at 19:21
  • @droravr That worked for me! If you'd like to post that as an answer, I'll mark it as the acceptable one :) – Sometowngeek Aug 08 '17 at 15:50
  • @Sometowngeek, sure, thanks :) – Dror Av. Aug 08 '17 at 16:25

1 Answers1

0

Instead of using:

with open('output.txt', 'w') as self._open_file:

Try using:

import codecs

with codecs.open('output.txt', 'w', 'utf-8')

This way the new file is opened with the correct utf-8 encoding.

Sometowngeek
  • 597
  • 1
  • 6
  • 27
Dror Av.
  • 1,184
  • 5
  • 14