2

I have a large string accepted from TCP listner which is in following format

"1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857:2/1/2012,234234234:3,7620257787,01234343456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:34,76202343457787,012434343456789,93339,34340922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036"

You can see that this is a : seperated string which contains Records which are comma seperated fields.

I am looking for the best (fastest) way that split the string in given number of chunks and take care that one chunk should contain full record (string upto ':')

or other way of saying , there should not be any chunck which is not ending with :

e.g. 20 MB string to 4 chunks of 5 MB each with proper records (thus size of each chunk may not be exactly 5 MB but very near to it and total of all 4 chunks will be 20 MB)

I hope you can understand my question (sorry for the bad english)

I like the following link , but it does not take care of full record while spliting also don't know if that is the best and fastest way.

Split String into smaller Strings by length variable

Community
  • 1
  • 1
Imran Rizvi
  • 7,331
  • 11
  • 57
  • 101
  • 3
    Start with `string.Split()`. If you figure it really _is_ the bottleneck in your program, do a custom/optimized version (if at all possible). Other than that, this is question is most likely a duplicate of [this](http://stackoverflow.com/q/568968/21567) one. – Christian.K Jul 09 '12 at 11:22
  • Did you already see this thread? "Does any one know of a faster method to do String.Split()?": http://stackoverflow.com/questions/568968/does-any-one-know-of-a-faster-method-to-do-string-split?lq=1 – Jens H Jul 09 '12 at 11:25
  • Please define 'large', Is your string just some kb or are you talking of Mb's or more? – Steve Jul 09 '12 at 11:29
  • what version of .NET are you using in this project? – Anton Semenov Jul 09 '12 at 12:28
  • @AntonSemenov I am using .Net 4.0 – Imran Rizvi Jul 09 '12 at 12:32
  • Are your 'records' (the data between ":") of fixed length as it appears from yuor example? – Steve Jul 09 '12 at 12:50
  • @Steve , what I am showing is sample data only. – Imran Rizvi Jul 09 '12 at 12:51

5 Answers5

3

I don't know how large a 'large string' is, but initially I would just try it with the String.Split method.

Freeman
  • 5,691
  • 3
  • 29
  • 41
Maarten
  • 22,527
  • 3
  • 47
  • 68
1

The idea is to divide the lenght of your data for the num of blocks required, then look backwards to search the last sep in the current block.

    private string[] splitToBlocks(string data, int numBlocks, char sep)
    {
        // We return an array of the request length
        if (numBlocks <= 1 || data.Length == 0)
        {
            return new string [] { data };
        }

        string[] result = new string[numBlocks];

        // The optimal size of each block
        int blockLen = (data.Length / numBlocks);

        int idx = 0; int pos = 0; int lastSepPos = blockLen;
        while (idx < numBlocks)
        {
            // Search backwards for the first sep starting from the lastSepPos
            char c = data[lastSepPos];
            while (c != sep) { lastSepPos--; c = data[lastSepPos]; }

            // Get the block data in the result array
            result[idx] = data.Substring(pos, (lastSepPos + 1) - pos);

            // Reposition for then next block
            idx++;
            pos = lastSepPos + 1;

            if(idx == numBlocks-1)
                lastSepPos = data.Length - 1;
            else
                lastSepPos = blockLen * (idx + 1);
        }
        return result;
    }

Please test it. I have not fully tested for fringe cases.

Imran Rizvi
  • 7,331
  • 11
  • 57
  • 101
Steve
  • 213,761
  • 22
  • 232
  • 286
  • sorry for not providing full information , the strings are variable length strings. – Imran Rizvi Jul 09 '12 at 12:54
  • Do you know an 'average' record size? – Steve Jul 09 '12 at 12:58
  • this looks exactly what I was looking, will try it tomorrow and compare the performance with the one given by Antone – Imran Rizvi Jul 09 '12 at 18:52
  • @ImranRizvi, thanks for the added check. Another case to check is: what happens when you get a block without a separator inside? Admittedly, it's really improbable, but the code doesn't handle that situation. – Steve Jul 10 '12 at 19:50
  • Are you sure of your last change? if(numBlocks<=1 ....), if numBlocks == 1 I think you need all of the input data. – Steve Jul 11 '12 at 07:33
1

OK, I suggest you way with two steps:

  1. Split string into chunks (see below)
  2. Check chunks for completeness

Splitting string into chunks with help of linq (linq extension method taked from Split a collection into `n` parts with LINQ? ):

string tcpstring = "chunk1 : chunck2 : chunk3: chunk4 : chunck5 : chunk6";
int numOfChunks = 4;

var chunks = (from string z in (tcpstring.Split(':').AsEnumerable()) select z).Split(numOfChunks);

List<string> result = new List<string>();
foreach (IEnumerable<string> chunk in chunks)
{
    result.Add(string.Join(":",chunk));                             
}

.......

static class LinqExtensions
{
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
    {
        int i = 0;
        var splits = from item in list
                     group item by i++ % parts into part
                     select part.AsEnumerable();
        return splits;
    }
}

Am I understand your aims clearly?

[EDIT] In my opinion, In case of performance consideration, better way to use String.Split method for chunking

Community
  • 1
  • 1
Anton Semenov
  • 6,227
  • 5
  • 41
  • 69
  • how can I pass number of chunks to create here? – Imran Rizvi Jul 09 '12 at 12:52
  • I didnt understand your question clearly. For example, you recieved a string from TCP, it contains 10 chunks, but you want only three. You will take into account only first three or not? – Anton Semenov Jul 09 '12 at 12:57
  • suppose I received 10000 records (':' seperated), I want to create a chunk of 4 so each chunk will contain 2500 records, is it clear? Your code seems to create chunk of 10000 – Imran Rizvi Jul 09 '12 at 12:59
  • looks closer, I don't want enumerate : seperated values so I will need only IEnumerable not IEnumerable> – Imran Rizvi Jul 09 '12 at 14:12
  • I suppose that in most cases you desire to got array[N][X], where N is number of chunks and X is a number of records in every chunk. So you will get IEnumarable. I made some modifications in my code, so you can clearly what i am talking about – Anton Semenov Jul 09 '12 at 14:34
  • I am able to get the IEnumerable but I lost ':' separator now what I am getting in a chunk is long very long comma separated string. – Imran Rizvi Jul 10 '12 at 07:31
  • see my answer again, the `result` list would contain what you are actually asking for – Anton Semenov Jul 10 '12 at 12:07
0

It seems you want to split on ":" (you can use the Split method). Then you have to add ":" after splitting to each chunk that has been split. (you can then split on "," for all the strings that have been split by ":".

Michel Keijzers
  • 15,025
  • 28
  • 93
  • 119
0
int index = yourstring.IndexOf(":");

string[] whatever = string.Substring(0,index);

yourstring = yourstring.Substring(index); //make a new string without the part you just cut out.

this is a general view example, all you need to do is establish an iteration that will run while the ":" character is encountered; cheers...

Freeman
  • 5,691
  • 3
  • 29
  • 41
  • Are you sure about that string.Split(index) method? I don't know an overload with an int-argument. – Maarten Jul 09 '12 at 11:29
  • sorry i ment string.Substring(0, index). where 0 is the start index and index is the point where it would stop. – Freeman Jul 09 '12 at 11:29