3

I have a list of unicode file paths in which I need to replace all umlauts with an English diacritic. For example, I would ü with ue, ä with ae and so on. I have defined a dictionary of umlauts (keys) and their diacritics (values). So I need to compare each key to each file path and where the key is found, replace it with the value. This seems like it would be simple, but I can't get it to work. Does anyone out there have any ideas? Any feedback is greatly appreciated!

code so far:

# -*- coding: utf-8 -*-

import os

def GetFilepaths(directory):
    """
    This function will generate all file names a directory tree using os.walk.
    It returns a list of file paths.
    """
    file_paths = []
    for root, directories, files in os.walk(directory):
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)
    return file_paths

# dictionary of umlaut unicode representations (keys) and their replacements (values)
umlautDictionary = {u'Ä': 'Ae',
                    u'Ö': 'Oe',
                    u'Ü': 'Ue',
                    u'ä': 'ae',
                    u'ö': 'oe',
                    u'ü': 'ue'
                    }

# get file paths in root directory and subfolders
filePathsList = GetFilepaths(u'C:\\Scripts\\Replace Characters\\Umlauts')
for file in filePathsList:
    for key, value in umlautDictionary.iteritems():
        if key in file:
            file.replace(key, value) # does not work -- umlauts still in file path!
            print file
Crazy Otto
  • 125
  • 2
  • 13
  • replace does not modify in place, it returns the modified string… – 301_Moved_Permanently Oct 22 '15 at 12:52
  • 2
    Possible duplicate of [Why doesn't calling a Python string method do anything unless you assign its output?](http://stackoverflow.com/questions/9189172/why-doesnt-calling-a-python-string-method-do-anything-unless-you-assign-its-out) – 301_Moved_Permanently Oct 22 '15 at 12:53
  • 2
    I'm not sure what the proper term is, but "diacritic" refers to the two dots used to mark umlauts, not the two-letter orthographic alternative. – chepner Oct 22 '15 at 12:55

2 Answers2

6

The replace method returns a new string, it does not modify the original string.

So you would need

file = file.replace(key, value)

instead of just file.replace(key, value).


Note also that you could use the translate method to do all the replacements at once, instead of using a for-loop:

In [20]: umap = {ord(key):unicode(val) for key, val in umlautDictionary.items()}

In [21]: umap
Out[21]: {196: u'Ae', 214: u'Oe', 220: u'Ue', 228: u'ae', 246: u'oe', 252: u'ue'}

In [22]: print(u'ÄÖ'.translate(umap))
AeOe

So you could use

umap = {ord(key):unicode(val) for key, val in umlautDictionary.items()}
for filename in filePathsList:
    filename = filename.translate(umap)
    print(filename)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
-1

Replace the line

file.replace(key, value)

with:

file = file.replace(key, value)

This is because strings are immutable in Python.

Which means that file.replace(key, value) returns a copy of file with replacements made.

BioGeek
  • 21,897
  • 23
  • 83
  • 145