5

C# application on Japanese Windows OS - Present Latin as Full-Width characters

I referred the accepted answer in the above link and is using the code below to convert Japanese string from full width to half width but it is returning the same full width string without converting.

string userInput = "チヨチヨチチヨチヨチ";
string result = userInput.Normalize(NormalizationForm.FormKC);

Expected output in half width: チヨチヨチチヨチヨチ Actual output: チヨチヨチチヨチヨチ (full width)

However, even though the above code is supposed to convert a full width string to half width, when I pass the half width string (チヨチヨチチヨチヨチ) to the above code, it converts it to full width form (チヨチヨチチヨチヨチ).

What am I doing wrong here?

Anyways I don’t want the above code to be executed if my string is already in half-width.

How can I check if a string is half width or full width?

RP1
  • 275
  • 1
  • 3
  • 14
  • You can try something like this: System.Text.Encoding.GetEncoding(encodingName).GetByteCount(str) – Nilay Vishwakarma Jul 22 '19 at 09:22
  • I'm assuming you want to convert your string to regular ASCII characters. In that case, there's a porting of Unidecode for C# available here: https://github.com/thecoderok/Unidecode.NET – Massimo Di Saggio Jul 22 '19 at 09:31
  • 2
    Can you explain what is the meaning of "full width and half width"? – Mostafa Vatanpour Jul 22 '19 at 09:35
  • 1
    @Mostafa This is full width.This is half width. – ProgrammingLlama Jul 22 '19 at 09:41
  • 1
    Half-width is like this 1 、2、3、4、5 ...a、b、c (half-width costs 1 byte). If we change half-width to full-width 1 、2、3、4、5 ...a、b、c → 1、2、3、4、5...a、b、c (full-width costs 2 bytes) – VAT Jul 22 '19 at 09:44
  • 1
    thanks @John and VAT . I found more info in: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms – Mostafa Vatanpour Jul 22 '19 at 10:27
  • 1
    I tested your code and it does not convert half width to full width. I tested by using .net framework 4.0 and .net core. Can you share a code that shows the issue? – Mostafa Vatanpour Jul 22 '19 at 10:36
  • I tested again with your updated question but it works correctly and does not convert half width to full width. I think there is another issue in your code, like you don't assign the result in new string. String is immutable and does not change. Share your test code please. – Mostafa Vatanpour Jul 22 '19 at 12:36
  • @Mostafa [this](https://rextester.com/DSAY97457) seems to reproduce OP's results. – ProgrammingLlama Jul 22 '19 at 12:40
  • 1
    @John the normalized form is チヨチヨチ. I shared the code here: https://rextester.com/ERSR23716 . You can see that it does not reverse convertion. – Mostafa Vatanpour Jul 22 '19 at 12:49
  • @Mostafa Yeah, you seem to be right. – ProgrammingLlama Jul 22 '19 at 12:52
  • It seems that `Normalize` method does not convert full width to half width always. It sometimes convert half width to full width. I think there is an standard for this that should be studied. But working with standards is better choice. – Mostafa Vatanpour Jul 22 '19 at 13:36
  • @Mostafa and John I appreciate your help. I've modified the question by adding more details so it will be helpful for someone. Thank you. – RP1 Jul 22 '19 at 14:32
  • I changed my answer. – Mostafa Vatanpour Jul 22 '19 at 15:14
  • Why you don't accept my answer? – Mostafa Vatanpour Aug 01 '19 at 07:01
  • @MostafaVatanpour Your answer was helpful to understand why my code isn't converting to half-width but it didn't answer the main question raised as per the title. How to "check if a string is half width or full width"? – RP1 Aug 15 '19 at 08:39
  • I changed my answer according to your comment. – Mostafa Vatanpour Aug 15 '19 at 10:24

1 Answers1

2

According to this document, the normalize method works as expected. It must convert characters to the standard characters, so the binary comparison can be applied correctly.

But if you want a custom conversion that always converts full-width to half-width, you can create a Dictionary to map full-width to half-width characters. This link may be helpful to create this map.

If you want to be sure that the string is in half-width then if it contains any full-width character, it is rejected. Create a string of all full-width characters(Latin and Japanese) then find all characters of the to test string in the full-width characters string.

I wrote isHalfWidthString method for this purpose and added full-width to half-width converter method also. I thought it may be helpful:

    public class FullWidthCharactersHandler
    {
        static Dictionary<char, char> fullWidth2halfWidthDic;
        static FullWidthCharactersHandler()
        {
            fullWidth2halfWidthDic = new Dictionary<char, char>();
            string fullWidthChars = "アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッァィゥェォャュョ゙゚ー0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
            string halfWidthChars = "アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッァィゥェォャュョ゙゚ー0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
            for (int i = 0; i < fullWidthChars.Length; i++)
            {
                fullWidth2halfWidthDic.Add(fullWidthChars[i], halfWidthChars[i]);
            }
        }

        public static bool isHalfWidthString(string toTestString)
        {
            bool isHalfWidth = true;
            foreach (char ch in toTestString)
            {
                if (fullWidth2halfWidthDic.ContainsKey(ch))
                {
                    isHalfWidth = false;
                    break;
                }
            }
            return isHalfWidth;
        }

        public static string convertFullWidthToHalfWidth(string theString)
        {
            StringBuilder sbResult = new StringBuilder(theString);
            for (int i = 0; i < theString.Length; i++)
            {
                if (fullWidth2halfWidthDic.ContainsKey(theString[i]))
                {
                    sbResult[i] = fullWidth2halfWidthDic[theString[i]];
                }
            }
            return sbResult.ToString();
        }
    }

For test use this link.

I updated the code to use Dictionary for better performance.

Mostafa Vatanpour
  • 1,328
  • 13
  • 18
  • I’m trying using Japanese character. I’ve updated the question using sample input. – RP1 Jul 22 '19 at 11:04
  • a string is a bad idea for this purpose. Use `std::set` or `std::unordered_set` instead – phuclv Aug 15 '19 at 12:45
  • @MostafaVatanpour in C# you still have HashSet, SortedSet and many others – phuclv Aug 15 '19 at 14:28
  • 1
    @phuclv yes you are right, using hash table is a better idea and has better performance but it needs more lines of code. I changed the code. thanks. – Mostafa Vatanpour Aug 15 '19 at 16:17
  • 1
    @everyone Approved this answer as this is the only solution (using dictionary) worked for converting Half-width Kana to Full-width Kana or vice versa. I implemented using a Dictionary because the half-width form of some full width Kana characters contains Dakuten ( ゙ ) and Handakuten ( ゚ ) are 2 characters. e.g. Full-width ヷ Half width ヷ – RP1 Feb 13 '20 at 04:41