there are about 28000 articles in our institution and their encoding is not utf-8. I was asked to find a way to change their encoding to utf-8. is there any linux or windows command that changes the encoding of file without opening the file? clearly it is not a good idea to open 28000 files and changing them one by one!
Asked
Active
Viewed 6,364 times
1
-
2If you don't even open the file, you can't read the data, much less rewrite it… – abarnert Oct 06 '13 at 06:57
-
but I know what their encoding is – M a m a D Oct 06 '13 at 06:59
-
This is not a programming question, and is off-topic here. "Is there any linux or windows command" is a question for [su]. Voting to migrate there. Good luck. – Ken White Oct 06 '13 at 07:08
-
2this is about shell programming so it is programming. – M a m a D Oct 06 '13 at 07:09
-
And you also know the contents of all the files you want to recode without opening and reading the files? – abarnert Oct 06 '13 at 07:09
-
I don't see anything related to "shell programming". I see a question asking for "linux or windows commands", which is not programming. Where is the code (or text) related to shell programming in your question? – Ken White Oct 06 '13 at 07:18
-
@KenWhite Linux by itself can not do such thing, as you can see I asked for a code to do this and code in linux is shell programming – M a m a D Oct 06 '13 at 07:31
-
@abarnert it is possible to read content – M a m a D Oct 06 '13 at 07:37
-
Sorry, but no. :-) For the third time, you asked for a "linux or windows command", and neither of those is "code". If I open a command prompt in Windows and type `dir`, it's not code, and neither is opening a terminal window in Linux and typing `ls`. – Ken White Oct 06 '13 at 07:39
-
maybe but what I'm looking for is not that much easy. I guess it will be hard enough to be views as a code – M a m a D Oct 06 '13 at 07:42
-
@Mohammad: It is not possible to read the content of a file without opening that file. – abarnert Oct 06 '13 at 07:42
2 Answers
8
iconv
can be used to convert text files from one encoding to another. Most linux distros should have it—usually as part of glibc
; if not, then as a separate installable package.
So, if they're, say, Latin-1 (ISO-8859-1), you can do something like this:
$ iconv -f ISO-8859-1 -t UTF-8 foo.txt >foo-utf8.txt
You can wrap this up in a one-liner with find
, something like:
$ tmpdir=$(mktemp -d -t $tempXXXXXX); find . -type f -exec iconv -f ISO-8859-1 -t UTF-8 {} >${tmpdir}/temp \; -exec mv ${tmpdir}/temp {} \; ; rmdir ${tmpdir}
But you can probably make it more readable and more robust in a half-dozen lines of bash/python/perl/whatever.

abarnert
- 354,177
- 51
- 601
- 671
-
thanks for reply, I will test your solution and let you know the results – M a m a D Oct 06 '13 at 07:04
0
you can change the encoding of a file easily by using basic shell commands.
$filesDir = Get-ChildItem "D:\Code"
$OutputDir="D:\programability\"
for ($j=0; $j -lt $filesDir.Count; $j++)
{
$SubDir=$filesDir[$j].FullName
[system.io.directory]::CreateDirectory($OutputDir+$filesDir[$j].name)
$files = Get-ChildItem $SubDir
for ($i=0; $i -lt $files.Count; $i++) {
$outfile = $OutputDir+$filesDir[$j].name+"\"+$files[$i].name
$files[$i].name
Get-Content $files[$i].FullName | Set-Content -Encoding UTF8 $outfile
}
}
This will change the file encoding to UTF-8, including files in subfolders

rahul garg
- 1
- 2