-2

I have a string in c#

How can i detect if this string contains Chars from Different Languages ?

i.e : a person fills his english name in text box and also his local language name.

I want to disallow that.

something like this :

"check the language table of the chars in the string and if it comes from different unicode tables - return ERROR".

but i think there is a problem for 'a' in us or uk.

maybe im wrong.

how can i recognize more than one language ?

Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • you mean *language*, *charset* or *culture info*? What platform do you use? ASP.NET of winforms or silverlight? Where does your application will be installed? What is the essence of the regex and the unicode tag? – Caspar Kleijne Sep 24 '11 at 09:03
  • 1
    Not all chars belong to a specific language. You will need a much stronger definition of your problem. – H H Sep 24 '11 at 09:05
  • @ Caspar Kleijne , thanks I added asp.net. its a web site which has a textbox that should contain only one language chars. – Royi Namir Sep 24 '11 at 09:07
  • @ Henk Holterman , so what do you suggest? I want to allow only one language... – Royi Namir Sep 24 '11 at 09:07
  • It's your problem, you will need to define it. You might get some better answers by giving more and _much better_ examples. And when you want to alert somebody, don't put a space after '@'. – H H Sep 24 '11 at 09:19
  • @Henk Holterman , i didnt know about the space part. and here is the example : "abdאבג"...... this string contains hebrew and english chars. i want to disallow that. it can be hebrew and other language. i think im very clear about that :) – Royi Namir Sep 24 '11 at 09:22

1 Answers1

1

I think you're searching for codepoints. The unique identifiers of a character in codepage. I think this should be useful to you How would you get an array of Unicode code points from a .NET String?. Once you get codepoints array from the string, you can check it against the range of code points you want.

Hope this helps.

Community
  • 1
  • 1
Tigran
  • 61,654
  • 8
  • 86
  • 123
  • i tried . i can figure it out - how do i see if the string contains more than 1 language? can you please explain ? – Royi Namir Sep 24 '11 at 12:04
  • 1
    Well, if you talking about natural language detection,there is no easy solution, you will need a dictionary of language and try to figure out more or less like modern browsers do. This is a complex. What you can do, is to create a set of numbers devided by alphabets (latin, arabic, hindu, russian) and identify the presence of different APHABETS but not LANGUAGES. In other words if I write a sentence in France mixed with Italian, you will never figure out (if I didn't use language specific letters), but you can defer: latin languages from non latin ones. – Tigran Sep 25 '11 at 09:02
  • , thanks how do i defer Latin vs not Latin ? can you reference me to some example? – Royi Namir Sep 25 '11 at 09:04
  • I think you can look here for distribution of codes: http://inamidst.com/stuff/unidata/ – Tigran Sep 25 '11 at 09:23