I have files from a Windows-PC which have special characters in their filename on an USB drive. On macOS this is a problem because of the following behavior:
- the files can be seen by
% ls
(without-a
) - the files can be seen by
% find .
- the files can not be removed with
rm
- the containing folder can be deleted with all its contents
- are not visisble in Finder.app
- the containing folder can not be copied in Finder, stops with error message
The operation can’t be completed because one or more required items can’t be found. (Error code -43)
- can not be renamed with
% mv
, stops with error messageNo such file or directory
- I can create a file with the same name in macOS
- The file works without problem on Windows 10
- Deleting the containing folder shows them in the Trash in macOS, but Trash can not be emptied
On Windows the files work flawlessly and can be renamed, then they work on macOS.
The problem is that there are characters in the filename that contain characters with two unicode code points. If I edit the filename on an Windows-PC on the USB drive, I need to delete two times to get rid of ü, because it consists of u and ¨.
Example:
Naming a file with the character combined of two unicode code points: U+0075
followed by U+0308
in Windows on an USB stick formatted with FAT32 or ExFAT. Putting the USB-Stick into a Macbook with macOS Ventura.
% ls
ü.txt
% mv ü.txt test.txt
mv: rename ü.txt to test.txt: No such file or directory
How to rename the files on macOS?
The USB drive is formatted FAT32, I also tried it with ExFAT, same corrupted behavior. The macOS is formatted APFS.
Since I expect some files I get in the future to also have such filenames, I want to avoid needing a PC for that sole purpose.
Example script in Z Shell
I tried with the answer of hc_dev
to create a working script.
# Filename is removeUTF16inFilename.zsh
# A script to remove any UTF-16 characters from the file name and adding one ASCII character in case the file name only consists of UTF-16 characters.
# Issue: Windows uses UTF-16 for encoding file-names. But macOS uses UTF-8 in Unicode NFD (Normalization Form Canonical Decomposition).
# Usage: Copy the script in the folder with the problematic files, then run the script with % zsh removeUTF16inFilename.zsh
count=1
for file in *
do
if [[ $file != "removeUTF16inFilename.zsh" ]]
then
echo "File number: $count"
echo "File name: $file"
newFilename=$(echo $file | tr -cd 'A-Za-z0-9_.-')
mv $file $count$newFilename
echo "Renaming $file to $count$newFilename"
((count++))
echo
fi
done
Still error No such file or directory
.
Example script in Swift 5
import Foundation
let fileManager = FileManager.default
let currentDirectoryPath = fileManager.currentDirectoryPath
do {
let contentsOfDirectory = try fileManager.contentsOfDirectory(atPath: currentDirectoryPath)
var count = 1
for filename in contentsOfDirectory {
if filename == "removeUTF16inFilename.swift" {
continue
}
print("File number: \(count)")
print("Filename: \(filename)")
let newFilename = String(count) + String(filename.unicodeScalars.filter{ $0.isASCII })
print("Renaming file to \(newFilename) \n")
let filePathOld = currentDirectoryPath + "/" + filename
let filePathNew = currentDirectoryPath + "/" + newFilename
try fileManager.moveItem(atPath: filePathOld, toPath: filePathNew)
count = count + 1
}
} catch {
print(error.localizedDescription)
}
print("Script finished")
Runs:
% pwd
/Volumes/BLACK/test
% ls -l
total 416
-rwx------@ 1 meuser staff 947 Dec 30 14:15 removeUTF16inFilename.swift
-rwx------ 1 meuser staff 88836 Nov 18 2020 ü.pdf
-rwx------ 1 meuser staff 0 Dec 30 10:46 ü.txt
% swift removeUTF16inFilename.swift
File number: 1
Filename: ü.txt
Renaming file to 1u.txt
“ü.txt” couldn’t be moved to “test” because either the former doesn’t exist, or the folder containing the latter doesn’t exist.
Script finished
Hexdump
735 14:51 % pwd
/Volumes/BLACK/test
736 14:51 % ls
ü.txt
737 14:51 % ls -l | hexdump -C
00000000 74 6f 74 61 6c 20 30 0a 2d 72 77 78 2d 2d 2d 2d |total 0.-rwx----|
00000010 2d 2d 20 20 31 20 6e 69 74 72 6f 20 20 73 74 61 |-- 1 nitro sta|
00000020 66 66 20 20 30 20 44 65 63 20 33 30 20 31 30 3a |ff 0 Dec 30 10:|
00000030 34 36 20 75 cc 88 2e 74 78 74 0a |46 u...txt.|
0000003b
748 14:59 % ls
ü.txt
749 14:59 % ls | hexdump -C
00000000 75 cc 88 2e 74 78 74 0a |u...txt.|
00000008
750 14:59 % echo .txt | hexdump -C
00000000 2e 74 78 74 0a |.txt.|
00000005
0a
is the ending, so the problematic ü
must be 75 cc 88
.
An ü
which is working for macOS is c3 bc
.
Hexdump with a file created in macOS named ü.txt
and the faulty file from Windows named ü.txt
which does not work.
% ls
ü.txt ü.txt
ls | hexdump -C
00000000 75 cc 88 2e 74 78 74 0a 75 cc 88 2e 74 78 74 0a |u...txt.u...txt.|
00000010
printf
does also not work
762 15:17 % ls
ü.txt
763 15:18 % ls | hexdump -C
00000000 75 cc 88 2e 74 78 74 0a |u...txt.|
00000008
764 15:18 % cp $(printf "\x75\xcc\x88").txt uuuuu.txt
cp: ü.txt: No such file or directory
Example script
The first command in the script works only if copied and used in PowerShell, not when the command is run in the script.
Visual Studio Code
keeps the Unicode characters on Windows and macOS. Most apps auto convert the character from U+0075
followed by U+0308
to just U+00FC
, for example Xcode
and TextEdit
.
Download from here: workupload