1

I have txt files that are greek and now I want to search specific words in them using perl and bash ... the words are like ?a?, t?, e??

I was searching for words in english and now want to replace them by greek but all I get is ??? mostly... for Perl:

my %word = map { $_ => 1 } qw/name date birth/;

and for bash

for X in name date birth
do

can someone please help me?

V-V
  • 73
  • 2
  • 9
  • 1
    You should give us some more information about your problem. What is the encoding of your text file and what is the encoding of your locale settings? – mikyra Feb 28 '13 at 22:56

1 Answers1

2
#!/usr/bin/perl
use strict;
use warnings;

# Tell Perl your code is encoded using UTF-8.
use utf8;

# Tell Perl input and output is encoded using UTF-8.
use open ':std', ':encoding(UTF-8)';

my @words = qw( καί τό εἰς );

my %words = map { $_ => 1 } @words;
my $pat = join '|', map quotemeta, keys %words;

while (<>) {
   if (/$pat/) {
      print;
   }
}

Usage:

script.pl file.in >file.out

Notes:

  • Make sure the source code is encoded using UTF-8 and that you use use utf8;.
  • Make sure you use the use open line and specify the appropriate encoding for your data file. (If it's not UTF-8, change it.)
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Are you sure his file is utf-8 encoded? My guess is it is rather ISO 8859-1 or something alike and thus the whole trouble. – mikyra Feb 28 '13 at 23:01
  • @mikyra, I told him what he needs. Any deviations could be trouble. There are 4 ways he could have deviated. Speculating which combination of those he got wrong is useless. – ikegami Feb 28 '13 at 23:03
  • sorry new on coding so have no idea my used notepad++ to write it – V-V Feb 28 '13 at 23:08
  • @ikegami: when i want to put it inside the file I get the following 'qw/?a?, t?, e??/;' – V-V Feb 28 '13 at 23:09
  • You mean that's what appears in your editor when you paste the text into your editor? Sorry, I don't use Notepad++, so I can't help you with problems with your editor. – ikegami Feb 28 '13 at 23:12
  • managed to change the encoding to UTF-* but i do still get a small erro... the middle letter of the last work comes as square – V-V Feb 28 '13 at 23:13
  • Probably just a font issue. (Right character, just can't get displayed by your editor's font.) If so, it'll still work. – ikegami Feb 28 '13 at 23:14
  • Convert what? Nothing I said is OS-specific. You'll want to convert line endings of text files as always (say, by using `dos2unix`), but the program will work on all types of systems. – ikegami Feb 28 '13 at 23:28
  • my perl is reading the outputs of another bash script which is not saying my files as UTF-8... I am wondering whether I can call anything to convert them or save them as UTF-8 – V-V Feb 28 '13 at 23:39
  • I think you mean "... which isn't encoded using UTF-8" when you said "... which is not saying my files as UTF-8". That's not a problem. Just change the `use open` line to specify the encoding actually used. – ikegami Feb 28 '13 at 23:52
  • I have included my full code...and the error I get running on linux terminal – V-V Mar 01 '13 at 00:21
  • That error is not from Perl. This is a brand new question for which you've given no usable information. – ikegami Mar 01 '13 at 00:24
  • I will try to see where the problem is and find the write question to ask for it ...Thanks for the help – V-V Mar 01 '13 at 00:34