0

i am developing a small program in vb6 that will work with an Arabic document, i want to count how many occurrence each Arabic letter appears in the document

basic arabic characters

ا أ إ آ ى ؤ ئ ء ب ت ة ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه

example sentence

البيت الكسز اللتيل الزجاج الست.‏

i don't know arabic or even know how to read it.

if vb6 won't work, i can use vb.net

Daniel Daranas
  • 22,454
  • 9
  • 63
  • 116
Smith
  • 5,765
  • 17
  • 102
  • 161

2 Answers2

2

It'll be much easier to use VB.Net.

  • VB6 has patchy support for Unicode.
  • In VB6, you'd probably need to change your PC system code page to Arabic to be able to read the document.

EDIT: Air code solution in VB.Net, partly based on this answer. It needs exception handling.

''# You may need a different character encoding here, this is UTF-8
Using sr As New IO.StreamReader("Test.txt", Text.Encoding.UTF8)
  Dim c As Char
  Dim dict As New Dictionary(Of String, Integer)

  Do Until sr.EndOfStream
   c = ChrW(sr.Read)

   If (dict.ContainsKey(c))
     dict(c)+=1
   Else
     dict(c) = 1
   End If
  Loop
End Using
Community
  • 1
  • 1
MarkJ
  • 30,070
  • 5
  • 68
  • 111
  • thanks, but am looking for a solution, or algorithm, which ever you can give me – Smith Feb 10 '11 at 11:51
  • OK, I might be able to provide a solution in VB.Net but you will need to provide some more information. What format is the document? Is it a text file? Do you know the character encoding (e.g. UTF-8, UTF-16)? – MarkJ Feb 10 '11 at 12:36
  • its a test file containing the characters i provided in the sample sentence above – Smith Feb 10 '11 at 13:01
  • OK, a text file. Do you know the [character encoding](http://en.wikipedia.org/wiki/Character_encoding)? Is it ANSI, UTF-8, UTF-16...? – MarkJ Feb 10 '11 at 13:26
  • 1
    Open the file in notepad. Hopefully you will see Arabic characters. If you see gibberish, let me know. Select File-Save As. There's an encoding entry in the "Save As" dialog and hopefully Notepad will have selected something in the drop-down. Please tell me exactly what it says. Also, if you have a moment you could read Joel on [the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets](http://www.joelonsoftware.com/articles/Unicode.html) – MarkJ Feb 10 '11 at 17:09
  • If you only need to support just the one alphabet you can use VB6 controls if you choose a font that contains the characters, set the control's Font property's Charset property appropriately, and carefully do a double-codepage conversion from the foreign codepage and back to the current system codepage (using StrConv() twice). This is covered in the VB6 docs under the Internationalization topics. – Bob77 Feb 10 '11 at 17:35
  • @MarkJ i have already read the textfile into a string file, and am using vb.net not c# – Smith Feb 10 '11 at 18:11
  • @ Bob Riemersma thanks, how do i count the character frequency – Smith Feb 10 '11 at 18:56
  • 1
    @smith I posted VB.Net, **definitely** not C#, although I admit there was a typo which I have just fixed. I don't have the VB.Net IDE on this phone :) so I can't actually check whether it's totally valid. Air code. – MarkJ Feb 10 '11 at 20:51
  • @MarkJ: Many thanks, i looked at the example you just showed in the url, it deals in word frequency, but i don't know how to adapt it to character frequency can you help me please. – Smith Feb 11 '11 at 08:01
  • @Smith I think I **already** helped you? ... you've seen the URL. Look just underneath the URL. I already posted VB.Net code yesterday, which counts characters rather than words. – MarkJ Feb 11 '11 at 09:34
  • @Bob Thanks for posting useful information about cross-code-page work in VB6. Honestly, though, for a small program (presumably being developed from scratch), *surely* this is one area where VB.Net is definitely a better choice? You don't need to worry about it at all. – MarkJ Feb 11 '11 at 09:56
  • @Smith You still haven't told us the character encoding. Did you see the instructions in my earlier comment about Notepad? Oh yes, and what did you mean "i have already read the textfile into a string file"? I'm sorry, but what's a "string file"? And why did you think this was a reply to my question about character encoding? I'm trying to find out information about the file format you need to read. – MarkJ Feb 11 '11 at 09:58
  • @MarkJ Openining the file in notepad shows boxed characters all through. will this work with unicode characters e.g arabic? – Smith Feb 11 '11 at 10:54
  • @Smith My code will work with any characters but you need to know the character encoding of the file before you can read it (with anyone's code). Notepad shows boxes, pity, no [BOM](http://en.wikipedia.org/wiki/Byte_order_mark) then. What happens if you open the file in Microsoft Word? (it's good at guessing code pages) Where did you get the file from and who created it? – MarkJ Feb 11 '11 at 13:14
  • @MarkJ i read the text in vb.net using the stream class into a variable strContents and load it in a text box, the text displays well. if i open i MSWord & wordpad, it displays correctly. how to i pass the variable to the stream object instead of textfile? – Smith Feb 11 '11 at 16:04
  • @smith if you can read the text in vb.net, then you've succeeded! In the code in my answer, there's a loop over characters that calculates the frequency of each char. Just take that loop and use it on the text. E.g. to loop over the chars in a string use For Each c As Char In str – MarkJ Feb 11 '11 at 18:25
  • @MarkJ: the problem here is that some characters compose of two single characters joined together – Smith Feb 12 '11 at 10:08
1

The easiest way would be to compare against an array of all Arabic character.. http://en.wikipedia.org/wiki/Arabic_alphabet

maadri
  • 11
  • 2