How do I find more letters other than A Z (example Lithuanian letters)

Question

I have a text and in that text letters. The program is capable of finding all the letters from A to Z. However, there are extra letters such as ą, č, ę, ė, į, š, ų, ū, ž. Do I have to create a new string that includes both english letters and the ones I need to find? How would I change this function then?

        public void kiek()
        {
            for (int i = 0; i < eil.Length; i++)
            {
                if (('a' <= eil[i] && eil[i] <= 'z') ||
                    ('A' <= eil[i] && eil[i] <= 'Z'))
                {
                    Rn[eil[i]]++;
                }
            }
        }

`Do I have to create a new string that includes both english letters and the ones I need to find` - yes, you do, unless you want [all of them](https://stackoverflow.com/a/28156797/11683). — GSerg, Nov 18 '20 at 18:02
Okay, but then how do I change my for loop to adapt to the new string — CAndy, Nov 18 '20 at 18:06
Use Encoding. The characters from 0x00 to 0x7F are standard characters that include the English letters and the characters 0x80 to 0xFF are mapped to unicode characters and vary depending on the encoding method. In your case you would use Windows-1257 encoding : https://en.wikipedia.org/wiki/Windows-1257 — jdweng, Nov 18 '20 at 18:07
@jdweng C# characters are in UTF-16, and your statement about characters 0x80 to 0xFF is completely off too. Please see https://www.joelonsoftware.com/articles/Unicode.html. — GSerg, Nov 18 '20 at 18:21
you can use range of characters https://unicode-table.com/ru/blocks/latin-extended-a/ — Stanislav, Nov 18 '20 at 18:24
@GSerg : Do you know what encoding is? Did you look at my link??? Did you read my response? Why do you think the characters are unicode and not Encoding 1257????? — jdweng, Nov 18 '20 at 18:25
@jdweng Because that is [the way `char` is defined in .NET](https://learn.microsoft.com/en-us/dotnet/api/system.char?view=net-5.0) (please read the first sentence). That is a most basic fact about the .NET BCL. On top of that, please see https://www.joelonsoftware.com/articles/Unicode.html like I suggested. — GSerg, Nov 18 '20 at 18:28
@GSerg : The article says exactly what I said. Encoding saves memory by using only one byte for each character but is only able to support a limited number of unicode characters. Encoding maps 128 character (0x80 to 0xFF) to unicode characters and uses one byte for each of these characters. — jdweng, Nov 18 '20 at 21:34

Pedro Lima · Answer 1 · 2020-11-18T18:34:18.997

3

Use Char.IsLetter

if (char.IsLetter(eil[i])) {
    // ...
}

or alternatively,

if (char.IsLetter(eil,i)) {
    // ...
}

edited Nov 18 '20 at 18:34

answered Nov 18 '20 at 18:03

Pedro Lima

1,576
12
21

1

Or alternatively `char.IsLetter(eil, i)` – Rick Davin Nov 18 '20 at 18:08
But if I swap my current if statement with your given one, I get an error that says Index was outside the bounds of the array. – CAndy Nov 18 '20 at 18:10
2

@David That would be a problem with your for loop. Did you modify it? If you're using the one in your original question, it should work just fine. Also, are you sure the error is fired in the if statement? Maybe it was fired in the line `Rn[eil[i]]++;`. – Pedro Lima Nov 18 '20 at 18:34

How do I find more letters other than A Z (example Lithuanian letters)

1 Answers1