0

I am trying to write a script that take people names as an arguments and create a folder with their names. But in folder names, the non-ascii chars and whitespaces can sometimes make problem so I want to remove or change them to ascii chars. I can remove the whitespace between name and surname but I can not figure out how can I change ş->s, ç->c, ğ->g, ı->i, ö->o.

Here is my code :

#!/bin/bash

ARRAY=("$@")
ELEMENTS=${#ARRAY[@]}


for (( i=0;i<$ELEMENTS;i++)) 
do  #C-like for loop syntax
    echo ${ARRAY[$i]} | grep "[^ ]*\b" | tr -d ' '
done 

I run my script like that myscript.sh 'Çişil Aksoy' 'Cem Dalgıç'

It should change the arguments like : CisilAksoy CemDalgic

Thanks in advance

EDIT : I found this solution, this does not look very pretty but it works.

sed 's/ş/s/gI; s/ç/c/gI; s/ü/u/gI; s/ö/o/gI; s/ı/i/gI;'

EDIT2 : SOLVED

#!/bin/bash

ARRAY=("$@")
ELEMENTS=${#ARRAY[@]}

for (( i=0;i<$ELEMENTS;i++)) 
do  #C-like for loop syntax
    v=$(echo ${ARRAY[$i]} | grep "[^ ]*\b" | tr -d ' ' | sed 's/ş/s/gI; s/ç/c/gI; s/ü/u/gI; s/ö/o/gI; s/ı/i/gI;')
    mkdir $v
done 
Batuhan B
  • 1,835
  • 4
  • 29
  • 39

2 Answers2

2

Anything that converts from UTF-8 to ASCII is going to be a compromise.

The iconv program does what was requested (not necessarily satisfying everyone, as in Transliterate any convertible utf8 char into ascii equivalent). Given

 Çişil Aksoy' 'Cem Dalgıç

in "foo.txt", and the command

iconv -f UTF8 -t ASCII//TRANSLIT <foo.txt

that would give

Cisil Aksoy' 'Cem Dalg?c

The lynx browser has a different set of ASCII approximations. Using this command

lynx -display_charset=us-ascii -force_html -nolist -dump foo.txt

I get this result:

C,isil Aksoy' 'Cem Dalgic,
Community
  • 1
  • 1
Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
1

Simply put, you can't. ASCII only supports 128 characters. International characters typically use some variation of Unicode, which can store a much much greater number of characters.

I think your best bet is to identify WHY your folder creation fails when using these characters. Does the method or function not support Unicode? If it does, figure out how to specify that instead of ASCII. If not, you might be stuck with sed and/or tr, which is probably not sustainable.

[UPDATED]

You should be able to substitute multiple characters via tr like follows:

echo şğıö | tr şçğıö scgio
sgio

(I removed my comment from earlier. I tried it on a different server and it worked fine.)

Paul Calabro
  • 1,748
  • 1
  • 16
  • 33
  • Actually it does not fail but I want to change all chars in valid ascii ones. The set of chars is which should i need to change is certain. (ş->s, ç->c, ı->i, ğ->g, ö->o, ü->u) – Batuhan B Oct 06 '15 at 23:33
  • sed 's/Ç/c/g; s/ş/s/g'' ' <<< 'Çişil' I think it is not a good way but i works like this – Batuhan B Oct 06 '15 at 23:38