2

I need to generate hundreds of random string which contains Chinese and Japanese characters for test purpose. Is there any C# library can help on this.

More detail : I find a small tool http://string.uttool.com/default.aspx which can generate string from several pre-define character sets. Anyone knows how it works , is its generation logic written in JS, C# or Java?

Micheal Wan
  • 23
  • 1
  • 4
  • Unlikely you will need a library. Look up CJK characters for different file encodings which will let you produce these characters. – BlackBox Sep 02 '13 at 14:52
  • any sample code to do this , thanks ! – Micheal Wan Sep 02 '13 at 14:54
  • I haven't got much experience with C# I'm afraid, but from what I understand, it supports Unicode throughout. All you need to do is find the range of these characters. Specifically, you can start here: http://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode – BlackBox Sep 02 '13 at 14:58
  • If it's just for testing, doesn't *one* chinese character, maybe repeated a cuple of times, suffice? - I might be wrong in assuming this is to test if unicode as a whole is supported properly. – Corak Sep 02 '13 at 15:09
  • Yes,Chinese or Japanese character should be repeated one or more times and maybe appear in different position of the string . – Micheal Wan Sep 02 '13 at 15:13

2 Answers2

2
IEnumerable<string> GetRandomStrings(int numberOfExpectedStrings, int minLength, int maxLength, Random randomizer)
{
    var abecedary = new char[] { 'a', 'b' };
    var strings = new List<string>();

    for (int i = 0; i < numberOfExpectedStrings; i++)
    {
        int lengthOfString = randomizer.Next(minLength, maxLength);
        var newString = new StringBuilder(lengthOfString);
        for (int k = 0; k < lengthOfString; k++)
        {
            int randomCharPosition = randomizer.Next(0, abecedary.Length);
            newString.Append(abecedary[randomCharPosition]);
        }
        strings.Add(newString.ToString());
    }

    return strings;
}

Replace abecedary with an array of the chineses or japanese characters of your choice.

Uri
  • 2,207
  • 2
  • 21
  • 21
  • 1
    I'd put min and max length as parameters, so they can be adjusted from the outside. Also, `var randomizer = new Random();` should probably be outside of the method (maybe a static variable), or you could end up with the same strings over and over, if you call that method in a tight loop. – Corak Sep 02 '13 at 15:34
  • Corrected, thanks. Didn't think about the randomizer factor. – Uri Sep 02 '13 at 15:38
  • 1
    Also, you could probably change `var abecedary = new char[] { 'a', 'b' };` to `var abecedary = "ab";`, since a `string` is basically a `char[]`. – Corak Sep 02 '13 at 15:39
  • 1
    There are lots of chinese characters, seems it will cost lots of time to put them in abecedary one by one . – Micheal Wan Sep 03 '13 at 03:09
2

To generate a string you may try to use this function:

    private static string GenerateString(int length, int minCharCode, int maxCharCode)
    {
        var builder = new StringBuilder(length);
        var random = new Random();
        for (var i = 0; i < length; i++)
        {
            builder.Append((char) random.Next(minCharCode, maxCharCode));
        }
        return builder.ToString();
    }

minCharCode and maxCharCode set your unicode characters range. You may call this function hundreds times. And if you whant a variable-length strings you may randomize length parameter at each call. Usage:

    static void Main(string[] args)
    {
        const int minJpnCharCode = 0x4e00;
        const int maxJpnCharCode = 0x4f80;
        var random = new Random();
        for (int i = 0; i < 10000; i++)
        {
            Console.WriteLine(GenerateString(random.Next(0, 50), minJpnCharCode, maxJpnCharCode));                
        }
        Console.ReadLine();
    }

Update Chinese and japanese chars has many ranges in unicode. You may take it from here Japanese or just use Google. Than you need the next code:

    /// <summary>
    /// Represents our characters range
    /// </summary>
    class Range
    {

        public int Begin { get; set; }

        public int End { get; set; }

        public Range(int begin, int end)
        {
            Begin = begin;
            End = end;
        }

    }

Our generator:

    private static string GenerateString(int length, IList<Range> ranges) 
    {
        var builder = new StringBuilder(length);
        var random = new Random();
        for (var i = 0; i < length; i++)
        {
            var rangeIndex = random.Next(ranges.Count);
            var range = ranges[rangeIndex];
            builder.Append((char)random.Next(range.Begin, range.End));
        }
        return builder.ToString();
    }

Usage:

        var ranges = new[]
        {
            new Range(0x4e00, 0x4f80),
            new Range(0x5000, 0x9fa0),
            new Range(0x3400, 0x4db0),
            new Range(0x30a0, 0x30f0), 
            // and so on.. add any range here
        };
        for (var i = 0; i < 10000; i++)
        {
            Console.WriteLine(GenerateString(random.Next(0, 50), ranges));
        }
Deffiss
  • 1,136
  • 7
  • 12
  • Thanks Deffiss, it helps. But from this post http://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode , seems there are several ranges for chinese characters in unicode. Any codes to solve this. – Micheal Wan Sep 03 '13 at 03:12