0

Hello there i am creating a video player with subtitles support using MediaElement class and SubtitlesParser library, i faced an issue with 7 arabic subtitle files (.srt) being displayed ???? or like this:

Latin lang

I tried multiple diffrent encoding but with no luck:

SubtitlesList = new SubtitlesParser.Classes.Parsers.SubParser().ParseStream(fileStream);
subLine = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(subLine));

or

 SubtitlesList = new SubtitlesParser.Classes.Parsers.SubParser().ParseStream(fileStream,Encoding.UTF8);

Then i found this and based on the answer i used Encoding.Default "ANSI" to parse subtitles then re-interpret the encoded text:

 SubtitlesList = new SubtitlesParser.Classes.Parsers.SubParser().ParseStream(fileStream, Encoding.Default);
 var arabic = Encoding.GetEncoding(1256); 
 var latin = Encoding.GetEncoding(1252);
 foreach (var item in SubtitlesList)
 {
  List<string> lines = new List<string>();
  lines.AddRange(item.Lines.Select(line => arabic.GetString(latin.GetBytes(line))));
  item.Lines = lines;
 }

this worked only on 4 files but the rest still show ?????? and nothing i tried till now worked on them, this what i found so far:

exoplayer weird arabic persian subtitles format (this gave me a hint about the real problem).

C# Converting encoded string IÜÜæØÜÜ?E? to readable arabic (Same answer).

convert string from Windows 1256 to UTF-8 (Same answer).

How can I transform string to UTF-8 in C#? (It works for Spanish language but not arabic).

Also am hoping to find a single solution to correctly display all the files is this possible ?

please forgive my simple language English is not my native language

homa
  • 66
  • 5
  • 1
    Never use Encoding.Default except when you know the data was produced by your system. If your subtitles where saved with code page 1256 (Arabic) then use this encoding in the constructor. Converting from one ANSI encoding to another never works well as these only share the lower 128 characters but not the upper. – ckuri Jul 27 '20 at 07:24
  • So thats why i was getting a few ?? in the subtitles even after converting it, i can confirm that using arabic encoding on the constructor gives the same result without the loss of some characters thank you, i guess selecting the right encoding is the correct way to handle this. – homa Jul 27 '20 at 09:17

1 Answers1

0

i think i found the answer to my question, as a beginner i only had a basic knowledge of encoding till i found this article

What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

Your text editor, browser, word processor or whatever else that's trying to read the document is assuming the wrong encoding. That's all. The document is not broken , there's no magic you need to perform, you simply need to select the right encoding to display the document.

I hope this helps anyone else who got confused about the correct way to handel this, there is no way to know the files correct encoding, only the user can.

homa
  • 66
  • 5