6

I'm trying to find all text files which have the encoding iso-8859-1 and convert these to UTF-8. My attempt so far is:

find . -name '*.txt' | xargs grep 'iso-8859-1' | cut -d ':' -f1 | 
xargs iconv -f ISO-8859-1 -t UTF-8 {} > {}.converted

The (obvious) problem is that the last variable substitution won't work, since {} occurs after the redirection, and doesn't belong to xargs. As is I only get one file called {}.converted, not a.txt.converted, b.txt.converted etc. How can I make this work?

Note: I'm doing this on Cygwin, where iconv doesn't seem to support -o.

jcollado
  • 39,419
  • 8
  • 102
  • 133
Alexander Torstling
  • 18,552
  • 7
  • 62
  • 74
  • Please have a look at this [related question](http://stackoverflow.com/q/845863/183066). – jcollado Jan 24 '12 at 12:20
  • I don't know which answer to accept. e.dan and glenn's answers are the most pragmatic, but Ole Tanges is the most esthetically pleasing. chorobas is also quite nice. Have to think about it. – Alexander Torstling Jan 30 '12 at 13:04

5 Answers5

3

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

find . -name '*.txt' | parallel grep -il iso-8859-1 | parallel iconv -f ISO-8859-1 -t UTF-8 {} \> {}.converted

You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
2

How about a for loop like:

for file in `find . -name '*.txt' | xargs grep 'iso-8859-1' | cut -d ':' -f1`; do
    iconv -f ISO-8859-1 -t UTF-8 $file > $file.converted
done
e.dan
  • 7,275
  • 1
  • 26
  • 29
1

Assuming none of your files have newline characters in the name, and assuming you have GNU find and xargs::

find . -name '*.txt' -print0 |
xargs -0 grep -l 'iso-8859-1' |
while read -r file; do
    iconv -f ISO-8859-1 -t UTF-8 "$file" > "$file".converted 
done

With grep -l, you don't need the cut command in the pipeline.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0

echo the command you want xargs to operate on to a string that is piped to the shell and that will overcome the substitution problem.

find . -name '*.txt' | xargs grep 'iso-8859-1' | cut -d ':' -f1 | 
xargs echo "iconv -f ISO-8859-1 -t UTF-8 {} > {}.converted" | bash
Ihe Onwuka
  • 467
  • 1
  • 3
  • 11
0

You are almost there:

find . -name '*.txt' | xargs grep -i iso-8859-1 | cut -f1 -d: | \
xargs -I% echo iconv -f l1 -t utf8 % \> %.utf | bash
choroba
  • 231,213
  • 25
  • 204
  • 289