1

there are about 28000 articles in our institution and their encoding is not utf-8. I was asked to find a way to change their encoding to utf-8. is there any linux or windows command that changes the encoding of file without opening the file? clearly it is not a good idea to open 28000 files and changing them one by one!

M a m a D
  • 1,938
  • 2
  • 30
  • 61
  • 2
    If you don't even open the file, you can't read the data, much less rewrite it… – abarnert Oct 06 '13 at 06:57
  • but I know what their encoding is – M a m a D Oct 06 '13 at 06:59
  • This is not a programming question, and is off-topic here. "Is there any linux or windows command" is a question for [su]. Voting to migrate there. Good luck. – Ken White Oct 06 '13 at 07:08
  • 2
    this is about shell programming so it is programming. – M a m a D Oct 06 '13 at 07:09
  • And you also know the contents of all the files you want to recode without opening and reading the files? – abarnert Oct 06 '13 at 07:09
  • I don't see anything related to "shell programming". I see a question asking for "linux or windows commands", which is not programming. Where is the code (or text) related to shell programming in your question? – Ken White Oct 06 '13 at 07:18
  • @KenWhite Linux by itself can not do such thing, as you can see I asked for a code to do this and code in linux is shell programming – M a m a D Oct 06 '13 at 07:31
  • @abarnert it is possible to read content – M a m a D Oct 06 '13 at 07:37
  • Sorry, but no. :-) For the third time, you asked for a "linux or windows command", and neither of those is "code". If I open a command prompt in Windows and type `dir`, it's not code, and neither is opening a terminal window in Linux and typing `ls`. – Ken White Oct 06 '13 at 07:39
  • maybe but what I'm looking for is not that much easy. I guess it will be hard enough to be views as a code – M a m a D Oct 06 '13 at 07:42
  • @Mohammad: It is not possible to read the content of a file without opening that file. – abarnert Oct 06 '13 at 07:42

2 Answers2

8

iconv can be used to convert text files from one encoding to another. Most linux distros should have it—usually as part of glibc; if not, then as a separate installable package.

So, if they're, say, Latin-1 (ISO-8859-1), you can do something like this:

$ iconv -f ISO-8859-1 -t UTF-8 foo.txt >foo-utf8.txt

You can wrap this up in a one-liner with find, something like:

$ tmpdir=$(mktemp -d -t $tempXXXXXX); find . -type f -exec iconv -f ISO-8859-1 -t UTF-8 {} >${tmpdir}/temp \; -exec mv ${tmpdir}/temp {} \; ; rmdir ${tmpdir}

But you can probably make it more readable and more robust in a half-dozen lines of bash/python/perl/whatever.

abarnert
  • 354,177
  • 51
  • 601
  • 671
0

you can change the encoding of a file easily by using basic shell commands.

$filesDir = Get-ChildItem "D:\Code"
$OutputDir="D:\programability\"
for ($j=0; $j -lt $filesDir.Count; $j++)
{
$SubDir=$filesDir[$j].FullName
[system.io.directory]::CreateDirectory($OutputDir+$filesDir[$j].name)
$files = Get-ChildItem $SubDir
for ($i=0; $i -lt $files.Count; $i++) {
    $outfile = $OutputDir+$filesDir[$j].name+"\"+$files[$i].name 
     $files[$i].name    
    Get-Content $files[$i].FullName | Set-Content -Encoding UTF8 $outfile
}
}

This will change the file encoding to UTF-8, including files in subfolders