11

I am writing a script to parse multiple log files and maintain a list of the files that have been processed. When I read the list of files to process I use os.walk and get names similar to the following:

C:/Users/Python/Documents/Logs\ServerUI04\SystemOut_13.01.01_20.22.25.log

This is created by the following code:

filesToProcess.extend(os.path.join(root, filename) for filename in filenames if logFilePatternMatch.match(filename))

It appears that "root" used forward slashes as the separator (I am on Windows and find that more convenient) but "filename" uses backslashes so I end up with an inconsistent path for the file as it contains a mixture of forward and back slashes as separators.

I have tried setting the separator with:

os.path.sep = "/"

and

os.sep = "/"

Before the .join but it seems to have no effect. I realize that in theory I could manipulate the string but longer term I'd like my script to run on Unix as well as Windows so would prefer that it be dynamic if possible.

Am I missing something?

Update:

Based on the helpful responses below it looks like my problem was self inflicted, for convenience I had set the initial path used as root like this:

logFileFolder = ['C:/Users/Python/Documents/Logs']

When I changed it to this:

logFileFolder = ['C:\\Users\\Python\\Documents\\Logs']

Everything works and my resulting file paths all use the "\" throughout. It looks like my approach was wrong in that I was trying to get Python to change behavior rather than correcting what I was setting as a value.

Thank you!

Chris
  • 333
  • 2
  • 5
  • 12
  • 2
    I'll go out on a limb and say that setting `os.sep` is probably not the right solution. – Martijn Pieters Apr 03 '13 at 17:03
  • possible duplicate of [Why not os.path.join use os.path.sep or os.sep?](http://stackoverflow.com/questions/12086224/why-not-os-path-join-use-os-path-sep-or-os-sep) – BrenBarn Apr 03 '13 at 17:03
  • As an anser to that duplicate question points out, `os.path` works by importing `posixpath` or `ntpath` depending on your OS. Interestingly, you can see in the source code of those modules that the path separator is hard-coded as a string literal inside the `join` function, so you won't be able to change it without writing your own `join` function. – BrenBarn Apr 03 '13 at 17:05
  • where is root being set? thats kind of critical to the answer... – Justin.Wood Apr 03 '13 at 17:12
  • @Justin.Wood: root comes from the first parameter to `os.walk`, that is `os.path.join`-ed with the dirnames in yields of recursive calls to itself. – Anthon Apr 03 '13 at 18:38
  • I (think I) set root as a string to the log file folder, in it I used forward slashes, are you saying that if I used backslashes (doubled) then I'd probably be okay, that the issue is that I chose to use a forward slash? I'll try changing my log folder parameter, thank you. – Chris Apr 03 '13 at 19:12

4 Answers4

8

I would keep my fingers off os.sep and use os.path.normpath() on the result of combining the root and a filename:

filesToProcess.extend(os.path.normpath(os.path.join(root, filename)) 
            for filename in filenames if logFilePatternMatch.match(filename))    
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Anthon
  • 69,918
  • 32
  • 186
  • 246
6

I have preferred the following utility function.

from os.path import sep, join

def pjoin(*args, **kwargs):
  return join(*args, **kwargs).replace(sep, '/')

It converts both variations (linux style and windows style) to linux style. Both windows and linux supports '/' separator in python.

I rejected the simplistic os.sep.join(['str','str','str']) because it does not take into account existing separators. Take the following case with sep.join vs vanilla join:

In[79]: os.sep.join(['/existing/my/', 'short', 'path'])
Out[79]: '/existing/my/\\short\\path'
In[80]: os.path.join('/existing/my/', 'short', 'path')
Out[80]: '/existing/my/short\\path'

The vanilla join could be repaired with the suggested:

In[75]: os.path.normpath(os.path.join('/existing/my/', 'short', 'path'))
Out[75]: '\\existing\\my\\short\\path'

So far so good. But then we introduce the following scenario where we will be interacting with linux from windows.

local_path = os.path.normpath(os.path.join('C:\\local\\base', 'subdir', 'filename.txt'))
remote_path = os.path.normpath(os.path.join('/remote/base', 'subdir', 'filename.txt'))
sftp_server.upload(local_path, remote_path)

The above will then fail because the sftp server expects a '/' separator while os.path.normpath will on windows normalize to '\'.

By using the pjoin utility function or similar, it will work cross OS, web, ftp, etc.

MrValdez
  • 8,515
  • 10
  • 56
  • 79
Tommy Strand
  • 1,384
  • 2
  • 14
  • 15
4

I use '/'.join([path1, path2]) to solve this probelm, because '/' works well in windows and linux.

oppo
  • 169
  • 8
  • 1
    This wont work in all cases, though. For example, `explorer.exe` only accepts path arguments specified with `\ `. – kiri Oct 17 '13 at 10:28
1

You are better off not touching os.sep and os.path.sep as they are not what os.path.join is using. You could use os.path.normpath as suggested by Anthon. Another alternative is to have your own simple path join:

os.sep.join([i1,i2,i3])

Anthon
  • 69,918
  • 32
  • 186
  • 246
jurgenreza
  • 5,856
  • 2
  • 25
  • 37