1

My tool IO informations:

○ I am creating output text file (.txt) from VB.net as filename paraText.txt

paraText.txt will be the input file for Perl


paraText.txt contents :

Gerade innerhalb der kulturhistorischen Behandlung nimmt die Kultivierung der Zeit durch den Menschen und dessen Zeitbewusstsein einen zentralen Platz ein. Unter dem Stichwort der Zeitkultur strebt die kulturhistorische Forschung nach der anthropologischen Erkenntnissuche, welches Bewusstsein der Mensch von seiner Zeit hat, wie er mit seiner Zeit umgeht, und ob bzw. wie er sie gestaltet, sie mit Sinn auflädt und strukturiert. Dabei wird sinnfällig, dass sich jede Kultur nicht zuletzt durch ihren Umgang mit der Zeit und deren Gliederung definiert: Man unterscheidet zurückliegende und bevorstehende, teils willkürlich, teils durch gesellschaftliche bzw. naturgegebene Einflüsse eingetretene und noch zu erwartende Ereignisse. Einen Großteil dieser Ereigniskultur bildet – der — Komplex des Festlichen.


Problem :

○ when creating output txt file from VB.net I get correct text as follows:

enter image description here

○ While reading that text at debugging stage in Perl I get the text like unformatted :

enter image description here

You can see the above picture that the first line is not encoded corrctly,

Note: I using the same .txt for both in and out but I can not read the text correctly while debugging in perl 5.16.3 using Komodo edit 8.5, notepad++ to see the text

I Tried :

○ I write the text file from vb.net using UTF8 encoding,

System.Text.Encoding.UTF8

○ I also use UTF8 encoding in Perl using the following ways:

use Encode;
use utf8;
use open IO => ':utf8';
use Encoding::FixLatin qw(fix_latin);;
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
binmode STDIN,  ":utf8";

My code Sample :

#!/usr/bin/perl -w
use strict;
use Cwd;
use HTML::Entities;
use HTML::Entities::Numbered;
use HTML::Strip;
use Encode;
use utf8;
use open IO => ':utf8';
use Encoding::FixLatin qw(fix_latin);;

binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
binmode STDIN,  ":utf8";

my $indPara = getcwd()."/paraText.txt";
open(INDPARA, $indPara) || die "Indesign paraText not found on location!";
my $indesignPara = <INDPARA>;
$indesignPara = fix_latin($indesignPara);
print decode_entities($indesignPara);
close INDPARA;

# I am getting value for $indesignPara as unformatted text like shown in above incorrect image

please anybody please solve this

Thanks in advance

Vimal

user3354853
  • 179
  • 9
  • It seems whatever you used to display the file used a different font for the first line. Plain text files don't specify fonts, so it's not related to Perl. Try viewing the file in a different tool. – choroba Apr 22 '14 at 08:40
  • Thanks Choroba, Yes you are right, I displays different font for 1 st line for any text, I tried to see using Sublime, in sublime it shows correctly so I tried to copy and find the two contents in one file but i can find only 1 item and that is real text file content – user3354853 Apr 22 '14 at 08:45
  • but I cant get the formatted text – user3354853 Apr 22 '14 at 11:35
  • I'm jut going to repeat my questions from your [previous question](http://stackoverflow.com/questions/23192846/how-to-read-text-file-contents-without-loss-of-characters-in-perl) - "What is the encoding of your input file? How are your reading data from the input file? How are you decoding the data you read from your input file. What encoding do you want in your output file? How are you writing data to your output file? How are you encoding the data you write to your output file?" – Dave Cross Apr 22 '14 at 12:27
  • Please give us a very short but complete sample program that demonstrates the problem. There are other things you could be missing. Also, I have a Perl Unicode primer at the end of Learning Perl. – brian d foy Apr 22 '14 at 15:05
  • "What is the encoding of your input file? - UTF8 How are your reading data from the input file? - shown on updated question. How are you decoding the data you read from your input file. - shown on updated question What encoding do you want in your output file? - UTF8 How are you encoding the data you write to your output file?" - as follows Dim para2 As New System.IO.StreamWriter(partxt2, False, System.Text.Encoding.UTF8) – user3354853 Apr 23 '14 at 05:27

1 Answers1

0

If you are creating the file correctly from the VB side, you shouldn't need to fix anything on the Perl side. Merely read it as UTF-8:

open INDPARA, '<:utf8', $indPara or die ...;

After that, anything you read should be ready to go.

brian d foy
  • 129,424
  • 31
  • 207
  • 592