How to get, copy and replace a non-ascii character in file with shell script?

Question

I have some problems when i want to replace non-ascii characters from filename. When I want to copy the file to do some test, it answer me with an "cannot open `FileName' for reading: No such file or directory. And all of non-ascii file are changed by an "_". Do you know how to get the real name or how to replace it l=with a good shell script? Thank you a lot.

It is unclear to me whether you want to replace non-ascii characters in a file name (i.e. _rename_ the file) or within a text file (i.e. modify the file _content_). What did you do when you got the above error message? Please give some additional information to make it easier to help you. — mschilli, Jul 29 '13 at 12:22
I have some errors.I want to replace characters on a filename because they are invalid for command like cp or mv(this is where come from the error).After more test the problems come from corrupt character because they come from MacOS or Windows but now i have to work with on linux.I try now to make a shell script to get corrupt filename and to replace corrupt characters. — ShoxSpartan, Jul 30 '13 at 07:30
I tried to answer your question. However, I have the feeling it would be better suited to superuser than to stackoverflow. I could be wrong however, since I am quite new here. — mschilli, Aug 06 '13 at 12:47

score 0 · Answer 1 · answered Jul 29 '13 at 11:39

0

To get the non-ascii characters in file user can use the following sed statement.

sed 's/[^\d32-\d126]//g' <file_name>

Above instruction will print the non ASCII characters in the input file to stdout. By giving -i option to sed user can remove the ASCII characters from the file.

To replace the non-ascci characters with a particular character user can use the following statement.

sed 's/[\d32-\d126]/<replacing_char>/g' <file_name>

answered Jul 29 '13 at 11:39

ted

3,911
3
26
49

Thank you a lot because I didn't know how to get limit in my character's research. – ShoxSpartan Jul 30 '13 at 07:32
I didn't understand, 'get limit in my character's research' ? If its about the ASCII, [32-126] cover all the characters in keyboard. – ted Jul 31 '13 at 12:34
The `\d` notation is not standard; this is not properly portable. A simple equivalent regex is `[ -~]` – tripleee Jun 02 '23 at 09:46

mschilli · Answer 2 · 2015-07-07T17:05:08.510

If you know the encoding that was used on the MacOS or Windows machine creating the file, you can use convmv to change that encoding to your like:

Re-encode a single file name from UTF16 to ASCII:

$ convmv -f utf8 -t ascii --notest <FILE NAME>

Re-encode a whole directory recursively from ISO8859-1 to UTF16 with Linux normalization:

$ convmv -f iso8859-1 -t utf16 --nfc -r --notest <DIRECTORY NAME>

For details see man convmv and man charsets.

Addendum:

If you do not have convmv installed, you can get it on its project page on freecode.com.

score 0 · Answer 3 · answered Jun 02 '23 at 09:53

All the earlier answers here so far explain how to handle non-ASCII contents in files, not the actual file names.

Try this to rename files to replace any non-ASCII characters with literal underscore characters in Bash:

for file in *[!\ -~]*; do
    mv -i "$file" "${file//[! -~]/_}"
done

The parameter expansion ${variable//pattern/replacement} produces the value of $variable with every instance of pattern replaced with replacement; so ${file//[! -~]/_} replaces every non-ASCII character in $file with an underscore. This particular construct is a Bash-only feature, so not portable to sh, Zsh, etc.

For a properly POSIX-portable solution, try using sed to perform the replacement.

for file in *[!\ -~]*; do
    mv -i "$file" "$(echo "$file" | sed 's/[^ -~]/_/g')"
done

However, the complaint you have about getting errors from cp etc seems to indicate that you probably actually have a quoting problem. The shell and its utilities can robustly handle any valid file name, but you need to know When to wrap quotes around a shell variable (or, more broadly, any string used as a file name etc). See also https://mywiki.wooledge.org/BashFAQ/020

How to get, copy and replace a non-ascii character in file with shell script?

3 Answers3