5

I'm trying to make a quick Python script to rename a bunch of files. These files were made in a Linux system on this NTFS drive, but I'm now on Windows. The naming convention looks like this:

Screenshot at 2016-12-11 21:12:56.png

The : character is illegal in Windows filenames, so the behaviour of this script is a little strange to me.

for i in os.listdir("."):
    print(i)
    x = i.replace(":", "-")
    comm = """mv "{}" "{}" """.format(i, x)
    os.system(comm)

In the above code, the print(i) prints the filenames happily. However when I try to run os.system(comm) to rename my files, I get this error:

mv: cannot stat ‘Screenshot at 2016-12-24 14:54:57.png’: No such file or directory

Firstly, I find it a little strange that Python under Windows can tell that these naughty files exist, but isn't able to actually move them. Secondly, what's the best way to get around this issue?

I've also tried shutil.move() and os.rename() with no luck. This SO question seems to discuss the issue, but seems more concerned with prevention than fixing it. I could obviously switch back to Linux and fix it, but I'm wondering if I can't fix it on Windows.

Daniel Porteous
  • 5,536
  • 3
  • 25
  • 44
  • you mix `"""` with `"`. does that work? should it not be `'mv "{}" "{}"'.format(i, x)` (no need for multi-line here)? – hiro protagonist Dec 24 '16 at 10:14
  • 1
    I would use `os.rename()` instead of a system call. – Jean-François Fabre Dec 24 '16 at 10:15
  • so the `mv` you are using, is it the MSYS/Cygwin flavour on windows? – Jean-François Fabre Dec 24 '16 at 10:18
  • @hiroprotagonist Yes it works, no doubt yours would too I just prefer the multi-line. @Jean `os.rename()` also doesn't work, and the `mv` is the `mv` offered in the Linux subsystem of Windows 10. Officially known as Bash on Ubuntu on Windows (Microsoft keeping it concise as always). – Daniel Porteous Dec 24 '16 at 10:21
  • @Daniel, WSL's `mv` command should require `bash -c "mv ..."`, and you'd have to use the `/mnt` folder. Run `where mv` to see what `mv` is executing. – Eryk Sun Dec 24 '16 at 23:44
  • Similar issue, unresolved: http://superuser.com/questions/31587/how-to-force-windows-to-rename-a-file-with-a-special-character – Yann Vernier Dec 25 '16 at 14:14

1 Answers1

4

You can find them because they're in the directory. You can't access them, because the colon symbol is parsed differently in a path. This means the files cannot be reached by common path functions including MoveFile. You basically have two options: finding a method that doesn't rely on the name, like OpenFileById, or finding an alternate name for the file, like dir /x. The latter gets you the short name (8.3), which should not contain any colons. I don't know if there's a ready function to access those names from Python, so the shortest clear (to me) workaround is executing dir /x and parsing its output.

I think paths relative to directory descriptors is as close as Python's standard library gets to the first method, but I don't know if it would be enough. The underlying FindFirstFile/FindNextFile functions do produce both names in WIN32_FIND_DATA (cFileName and cAlternateFileName), but Python expects the first one to be valid. Either method would also have made sense in PowerShell, but it looks like it is wholly unaware of short names and also tracks files by name, not IDs. Otherwise FileInfo.MoveTo would've done the trick neatly.

To prevent this situation in the first place, ntfs-3g supports a windows_names option. This causes it to balk when trying to create the files.

Conclusion: as discussed in https://superuser.com/questions/31587/how-to-force-windows-to-rename-a-file-with-a-special-character there is no clear solution. All of my attempted methods (and a handful others) have been discussed there. Probably the least messy option is to mount the disk in Linux again and rename from there; the filesystem is technically corrupt because the characters are invalid, but Microsoft's repair solution is deletion, not renaming.

Cygwin merely emulated the colon by using a private unicode character (':'+0xf000).

Community
  • 1
  • 1
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
  • Does the NTFS filesystem driver in Linux default to creating the 8.3 DOS names? Even in Windows, creating short names can be disabled per volume or in general. If you don't have short names, unfortunately opening by file ID will not work. NTFS supports file hard links, so it immediately fails attempts to rename or delete a file by ID because the target name is ambiguous. The file has to be opened by name. – Eryk Sun Dec 24 '16 at 23:03
  • Actually, it's unlikely anything was written with Linux' ntfs driver; it's more common to use ntfs-3g. Its documentation mentions an option `windows_names` to forbid filenames like these, and that they should be valid on Windows using the Posix layer. Indeed, it looks like they are usable in Cygwin, which should be a cleaner solution. – Yann Vernier Dec 25 '16 at 02:56
  • 2
    I tried using the `windows_names` option when mounting an NTFS volume, but it didn't automatically create DOS 8.3 names. I had to manually add the short name to the file record via `setfattr -h -v "DOS_NAME.EXT" -n system.ntfs_dos_name "Windows long filename.ext"`. – Eryk Sun Dec 25 '16 at 07:35
  • Also, the NTFS filesystem driver in Windows absolutely does not support filenames containing any of the 5 wildcard characters (*?<>"). They're reserved for wildcard matching in `NtfsQueryDirectoryFile` (the system call for listing a directory). If any name contains a wildcard character, the call fails with `STATUS_FILE_CORRUPT_ERROR` (0xC0000102). Names containing control characters, slashes, pipe, and colon can be listed, but such files cannot be opened by name, not even by the new Linux subsystem under `/mnt` -- and certainly not by Cygwin, which just uses standard NT system calls. – Eryk Sun Dec 25 '16 at 08:03
  • Maybe Cygwin smuggles invalid filenames in an NTFS named stream, or some other clever workaround. The root filesystem (VolFs) in the new Linux subsystem smuggles invalid characters by escaping them, e.g. "test*?<>":\|.txt" is stored as "test#002A#003F#003C#003E#0022#003A#005C#007C.txt". – Eryk Sun Dec 25 '16 at 09:00
  • Cygwin does smuggle; it's using the Unicode Private Use Area, encoding it as 0xf000+original. This could be found by picking the character from the GUI instead of cmd. My mistake. – Yann Vernier Dec 25 '16 at 14:04
  • Running `dir /x` or `dir /X` is proving fruitless, these files never got short names. Beyond that, while it's an interesting discussion in these comments, I'm not sure about precisely what you're suggesting. – Daniel Porteous Dec 26 '16 at 01:51
  • At this point, moving the disk back to Linux and renaming there. This attempt at an answer is a failure, but at least we learned a bit. :/ – Yann Vernier Dec 26 '16 at 13:57