9

I have a page in my website which gets it's main content from an old mainframe. The content encoding from the mainframe is windows-1255 (Hebrew). My website's encoding is UTF-8.

At first I used an iframe to display the received answer from the mainframe. In that solution I had no problem setting the encoding of the page and the characters display was fine, but I had some problems styling the page responsively (My all website is responsive).

Then I tried fetching the content with file_get_contents and add it in the right place, but all the characters look like this: ����� ��, I then converted the content:

iconv("cp1255","UTF-8",file_get_contents("my_url"));

The result of that was reversed Hebrew. For example the word "nice" appears as "ecin". The content also includes HTML tags, not only Hebrew text, so I can't simply reverse the text with hebrev.

I saw that in PHP 4 the function fribidi_log2vis exists, which seems to solve my problem, but it's not supported in PHP 5 (I'm working with PHP 5.3.3).

Is there a way handling it better than loading the content into an iframe?

UPDATE

I tried to fetch a test file that I created (with encoding windows-1255) and my original code works OK. I suspect that the content I'm getting is not windows-1255, at least not in the terms of Hebrew letters order. The conversion on the mainframe might be the cause. I'll have to look into that (I have to wait until Sunday cause I don't have a direct access to the server).

Itay Gal
  • 10,706
  • 6
  • 36
  • 75
  • Have you tried [mb_convert_encoding](http://us3.php.net/mb_convert_encoding)? – Machavity Jan 03 '14 at 15:46
  • @Machavity mb_convert_encoding also results with reversed text. – Itay Gal Jan 03 '14 at 15:55
  • I know nothing about Hebrew but it seems you've converted to UTF-8 quite successfully; perhaps you just need to tweak your HTML markup to inform the browser that such text must be displayed as RTL. – Álvaro González Jan 03 '14 at 15:59
  • @ÁlvaroG.Vicario I set the page to RTL. The rest of the UTF-8 text in hebrew, like my menu text, is being displayed OK, but the converted text is reversed. – Itay Gal Jan 03 '14 at 16:02
  • 1
    BTW, fribidi_log2vis() *is* supported in PHP 5, it's just not bundled with PHP any more. See the [PECL page](http://pecl.php.net/package/fribidi) for further details and even Windows downloads. – Álvaro González Jan 03 '14 at 16:08
  • First, you can cheat and only reverse hebrew substrings within the resulting string with some `preg_replace_callback`. Secondly, it appears as if the content coming from the mainframe is not `cp1255`, or the content contains some `bidi` symbols, which control the text direction. Anyhow it's hard to tell from here, but if you could upload an example file content we might be able to help further – Alex Jan 03 '14 at 19:54
  • Try this: http://stackoverflow.com/questions/20600843/php-ziparchive-non-english-filenames-return-funky-filenames-within-archive –  May 12 '14 at 11:16

1 Answers1

2

The problem that file_get_contents geting the content with ISO 8859-1 as character encoding. You must create a stream context by function stream_context_create with charset Windows-1255 for file_get_contents:

$opts = array('http' => array('header' => 'Accept-Charset: windows-1255,utf-8;q=0.7,*;q=0.7'));
$context = stream_context_create($opts);

$content = file_get_contents('my_url', false, $context);
iconv("cp1255", "UTF-8", $content);
Dmitriy.Net
  • 1,476
  • 13
  • 24