0

I have a special character in my .txt file.

I want to substitute that special character ý with | and rename the file to .mnt from .txt.

Here is my code: it renames the file to .mnt, but does not substitue the special character

#!/bin/sh
for i in `ls *.txt 2>/dev/null`;
do
filename=`echo "$i" | cut -d'.' -f1`
sed -i 's/\ý/\|/g' $i
mv $i ${filename}.mnt
done

How to do that?

Example:

BEGIN_RUN_SQLýDELETE FROM PRC_DEAL_TRIG WHERE DEAL_ID = '1:2:1212'
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Imran Hemani
  • 599
  • 3
  • 12
  • 27

1 Answers1

4

You have multiple problems in your code. Don't use ls in scripts and quote your variables. You should probably use $(command substitution) rather than the legacy `command substitution` syntax.

If your task is to replace ý in the file's contents -- not in its name -- sed -i is not wrong, but superfluous; just write the updated contents to the new location and delete the old file.

#!/bin/sh
for i in *.txt
do
    filename=$(echo "$i" | cut -d'.' -f1)
    sed 's/ý/|/g' "$i" >"${filename}.mnt" && rm "$i"
done

If your system is configured for UTF-8, the character ý is representable with either the byte sequence \xc3 \xbd (representing U+00FD) or the decomposed sequence \0x79 \xcc \x81 (U+0079 + U+0301) - you might find that the file contains one representation, while your terminal prefers another. The only way to really be sure is to examine the hex bytes in the file and on your terminal. It is also entirely possible that your terminal is not capable of displaying the contents of the file exactly. Try

bash$ printf 'ý' | xxd
00000000: c3bd

bash$ head -c 16 file | xxd
00000000: 4245 4749 4e5f 5255 4e5f 5351 4cff 4445  BEGIN_RUN_SQL.DE

If (as here) you find that they are different (the latter outputs the single byte \xff between "BEGIN_RUN_SQL" and "DE") then the trivial approach won't work. Your sed may or may not support passing in literal hex sequences to say exactly what to substitute; or perhaps try e.g. Perl if not.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • http://shellcheck.net/ automatically diagnosed the first three problems. – tripleee Feb 12 '19 at 13:25
  • ý is still not getting replaced – Imran Hemani Feb 12 '19 at 13:37
  • So that's in the file's contents, not in its name? Can you provide the output from `locale` and a hex dump of the bytes round the problematic character? – tripleee Feb 12 '19 at 13:54
  • how do i provide that can you guide me? – Imran Hemani Feb 12 '19 at 14:22
  • See https://meta.stackoverflow.com/questions/379403/problematic-questions-about-decoding-errors – tripleee Feb 12 '19 at 14:23
  • LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= HEX DUMP: 0000-0001: fd – Imran Hemani Feb 12 '19 at 15:48
  • And the output of `xxd troublesomefile | head -n 1` (provided the troublesome data will fit on the first line)? – tripleee Feb 12 '19 at 17:32
  • @ImranHemani The `locale` output should probably be added to your question instead of displayed in a comment. Anyway, see updated answer now. – tripleee Feb 13 '19 at 07:20