0

I have files from a Windows-PC which have special characters in their filename on an USB drive. On macOS this is a problem because of the following behavior:

  • the files can be seen by % ls (without -a)
  • the files can be seen by % find .
  • the files can not be removed with rm
  • the containing folder can be deleted with all its contents
  • are not visisble in Finder.app
  • the containing folder can not be copied in Finder, stops with error message The operation can’t be completed because one or more required items can’t be found. (Error code -43)
  • can not be renamed with % mv, stops with error message No such file or directory
  • I can create a file with the same name in macOS
  • The file works without problem on Windows 10
  • Deleting the containing folder shows them in the Trash in macOS, but Trash can not be emptied

On Windows the files work flawlessly and can be renamed, then they work on macOS.

The problem is that there are characters in the filename that contain characters with two unicode code points. If I edit the filename on an Windows-PC on the USB drive, I need to delete two times to get rid of ü, because it consists of u and ¨.

Example:

Naming a file with the character combined of two unicode code points: U+0075 followed by U+0308 in Windows on an USB stick formatted with FAT32 or ExFAT. Putting the USB-Stick into a Macbook with macOS Ventura.

% ls
ü.txt
% mv ü.txt test.txt
mv: rename ü.txt to test.txt: No such file or directory

How to rename the files on macOS?

The USB drive is formatted FAT32, I also tried it with ExFAT, same corrupted behavior. The macOS is formatted APFS.

Since I expect some files I get in the future to also have such filenames, I want to avoid needing a PC for that sole purpose.


Example script in Z Shell

I tried with the answer of hc_dev to create a working script.

# Filename is removeUTF16inFilename.zsh
# A script to remove any UTF-16 characters from the file name and adding one ASCII character in case the file name only consists of UTF-16 characters.

# Issue: Windows uses UTF-16 for encoding file-names. But macOS uses UTF-8 in Unicode NFD (Normalization Form Canonical Decomposition).
# Usage: Copy the script in the folder with the problematic files, then run the script with % zsh removeUTF16inFilename.zsh

count=1

for file in *
do
    if [[ $file != "removeUTF16inFilename.zsh" ]]
    then
        echo "File number: $count"
        echo "File name: $file"
        newFilename=$(echo $file | tr -cd 'A-Za-z0-9_.-')
        mv $file $count$newFilename
        echo "Renaming $file to $count$newFilename"
        ((count++))
        echo
    fi
done

Still error No such file or directory.


Example script in Swift 5

import Foundation

let fileManager = FileManager.default
let currentDirectoryPath = fileManager.currentDirectoryPath
do {
    let contentsOfDirectory = try fileManager.contentsOfDirectory(atPath: currentDirectoryPath)
    
    var count = 1
    for filename in contentsOfDirectory {
        if filename == "removeUTF16inFilename.swift" {
            continue
        }
        print("File number: \(count)")
        print("Filename: \(filename)")

        let newFilename =  String(count) + String(filename.unicodeScalars.filter{ $0.isASCII })
        
        print("Renaming file to \(newFilename) \n")
        let filePathOld = currentDirectoryPath + "/" + filename
        let filePathNew = currentDirectoryPath + "/" + newFilename
        try fileManager.moveItem(atPath: filePathOld, toPath: filePathNew)
        
        count = count + 1
    }
} catch {
    print(error.localizedDescription)
}

print("Script finished")

Runs:

% pwd
/Volumes/BLACK/test
% ls -l
total 416
-rwx------@ 1 meuser  staff    947 Dec 30 14:15 removeUTF16inFilename.swift
-rwx------  1 meuser  staff  88836 Nov 18  2020 ü.pdf
-rwx------  1 meuser  staff      0 Dec 30 10:46 ü.txt
% swift removeUTF16inFilename.swift 
File number: 1
Filename: ü.txt
Renaming file to 1u.txt 

“ü.txt” couldn’t be moved to “test” because either the former doesn’t exist, or the folder containing the latter doesn’t exist.
Script finished


Hexdump

735 14:51 % pwd
/Volumes/BLACK/test
736 14:51 % ls
ü.txt
737 14:51 % ls -l | hexdump -C
00000000  74 6f 74 61 6c 20 30 0a  2d 72 77 78 2d 2d 2d 2d  |total 0.-rwx----|
00000010  2d 2d 20 20 31 20 6e 69  74 72 6f 20 20 73 74 61  |--  1 nitro  sta|
00000020  66 66 20 20 30 20 44 65  63 20 33 30 20 31 30 3a  |ff  0 Dec 30 10:|
00000030  34 36 20 75 cc 88 2e 74  78 74 0a                 |46 u...txt.|
0000003b
748 14:59 % ls
ü.txt
749 14:59 % ls | hexdump -C
00000000  75 cc 88 2e 74 78 74 0a                           |u...txt.|
00000008
750 14:59 % echo .txt | hexdump -C
00000000  2e 74 78 74 0a                                    |.txt.|
00000005

0a is the ending, so the problematic ü must be 75 cc 88. An ü which is working for macOS is c3 bc.


Hexdump with a file created in macOS named ü.txt and the faulty file from Windows named ü.txt which does not work.

% ls
ü.txt   ü.txt

ls | hexdump -C   
00000000  75 cc 88 2e 74 78 74 0a  75 cc 88 2e 74 78 74 0a  |u...txt.u...txt.|
00000010

printf does also not work

762 15:17 % ls
ü.txt
763 15:18 % ls | hexdump -C
00000000  75 cc 88 2e 74 78 74 0a                           |u...txt.|
00000008
764 15:18 % cp $(printf "\x75\xcc\x88").txt uuuuu.txt
cp: ü.txt: No such file or directory

Example script

The first command in the script works only if copied and used in PowerShell, not when the command is run in the script. Visual Studio Code keeps the Unicode characters on Windows and macOS. Most apps auto convert the character from U+0075 followed by U+0308 to just U+00FC, for example Xcode and TextEdit.

Download from here: workupload

Binarian
  • 12,296
  • 8
  • 53
  • 84
  • 1
    Can you check which byte(s) are actually used for `ü`, by doing something like: `ls -l ü.pdf | hexdump -C` ? (and let us know which byte value are show for the `ü`. I tried using the ü which is also used here: https://stackoverflow.com/a/23009283/724039, but was able to rename the file... – Luuk Dec 30 '22 at 13:32
  • @Luuk see question, I have updated it – Binarian Dec 30 '22 at 13:53
  • 1
    `cp $(printf "\x75\xcc\x88").pdf uuuuu.pdf` should work. – Luuk Dec 30 '22 at 14:10
  • @Luuk :-( nope. Same error. File works fine on Windows. – Binarian Dec 30 '22 at 14:20
  • 1
    hc_dev already gave a good answer, but as alternative here is a **hack** which I would do in such case: I would first load all candidate file names into an array (`f=(*.txt)`), and then identify the position of the offending file in the array. Since the files are sorted by name, chances are good that the odd file is the first or the last in the array and therefore easy to locate (first element has index `1`, last element has index `-1`). Assume however that the bad file is in position 5 of the array: I would then do `mv -iv $f[5] newname.txt`. – user1934428 Jan 02 '23 at 12:04
  • 1
    Of course in your concrete case, the `ls` tells us that you have only one file, so you could simply do a `mv * uuuuu.txt`. – user1934428 Jan 02 '23 at 12:06
  • @user1934428 None of that works, the answer of hc_dev does also not work. I think Terminal changes the characters from the really used ones to ones that Terminal can display. Using mv does not find the file because the given filename is not exactly the same characters. – Binarian Jan 07 '23 at 14:47
  • Updated the question, I can create a file on macOS to have 2 files with the exact same name, when asking the Terminal. – Binarian Jan 07 '23 at 14:52
  • 1
    An example about how this file was created on Windows would be nice (yep, a good old batch-file will do ), so this stuff could be reproduced. – Luuk Jan 07 '23 at 15:40
  • 1
    This does not explain the behaviour. Actually, no special "magic" on the side of the terminal application can be involved, when you do a _mv * uuuuu.txt_ . If this does not work, it more looks like a problem with the zsh implementation. You are on MacOS: Can you rename the file using the `Finder` app in the Mac? – user1934428 Jan 08 '23 at 15:34
  • 1
    Also, it is not clear to me if the broken file resides on the USB drive, or if you managed to copy the files from the USB drive to the Mac, and it is still broken there? What physical medium is it? A USB stick or a portable hard disk? – user1934428 Jan 08 '23 at 15:36
  • @user1934428 I am not able to % cp in Terminal, same error as with mv, I am not able to see the file in Finder. Updated the question with example PowerShell script where you need to copy out the command. – Binarian Jan 09 '23 at 10:56
  • I don't know what _% cp in Terminal_ means, but if you don't see the file in Finder either, I would conclude that if it exists, it is a hidden file (which would explain why the `mv` command did not work; I forgot about this possibility. The correct command for renaming the file would then be `mv *(D) uuuuu.txt`, again assuming that this is the **only** file in the directory. – user1934428 Jan 09 '23 at 11:32
  • Not able to reproduce with attached PowerShell script (which is hosted externally against SO rule...) – Luuk Jan 09 '23 at 18:42
  • @Luuk As I wrote it does not work in the script, but only when the command is copied. Since the question is closed I just give up. I can reproduce it on any PC with a freshly installed Mac. But I have to accept that some files are not possible to see when created on Windows. – Binarian Jan 09 '23 at 19:51
  • @Binarian: I executed the script on my Windows system, and got a filename: `ü.txt`, which I could see on my mac, after copying it. – Luuk Jan 09 '23 at 20:02
  • @Luuk Read what I wrote in the first line: `The first command in the script works only if copied and used in PowerShell, not when the command is run in the script.` I do not know why. – Binarian Jan 13 '23 at 06:50
  • When "it only works on commanf line", you should also tell which codepage that command line is using (the output of `chcp`) Because that could interfere with the working of your command line. – Luuk Jan 13 '23 at 18:01
  • @Luuk I just run `New-Item -Path .\ü.txt`, and check that `ü` is a character made by two characters. To get that `ü` I can copy it from the script. – Binarian Jan 13 '23 at 18:22

1 Answers1

2

Issue

Windows uses UTF-16 for encoding file-names. But macOS uses UTF-8 in Unicode NFD (Normalization Form Canonical Decomposition).

See:

Rename using tr -cd to remove UTF-16 characters

As suggested in this answer from AskDifferent use a translation-command with a limited character-set (e.g. ASCII). In combination with mv this can remove all unwanted characters like UTF-16 in the destination file-names.

Try the file-renaming in this simulation before activating it:

for file in *; do echo mv "$file" `echo $file | tr -cd 'A-Za-z0-9_.-'` ; done

Note: To activate the renaming remove the preventing "echo " in front of the mv command.

Make use of the inode number

Like many Unix/Linux systems, BSD also has the concept of inode. To show the inode of files, use flag -i with ls:

ls -i
12382580 _ü.txt

Then could pick the file again using find * -inum 12382580 and passing on the the next command, e.g. using any of -exec, other command-options or piping with -print to xargs (respectively -print0 to xargs -0).

See also:

hc_dev
  • 8,389
  • 1
  • 26
  • 38
  • 1
    Might be related when dealing with unicode filenames in Python: [macos - Unicode encoding for filesystem in Mac OS X not correct in Python? - Stack Overflow](https://stackoverflow.com/questions/9757843/unicode-encoding-for-filesystem-in-mac-os-x-not-correct-in-python) – hc_dev Dec 30 '22 at 10:04
  • It was a good idea to use `for file in *` instead of autocompletion, but still `mv` does not work. It seems that the shell or Terminal.app does not pass the filename correctly. I also tried it without quotes and within a script. I enabled all Encodings in the Terminal. – Binarian Dec 30 '22 at 11:23
  • I can create files on macOS with extravagant characters like `哦哦哦`, there the mv command works and they are shown in Finder. – Binarian Dec 30 '22 at 14:08