7

In C++, how do I split a string into evenly-sized smaller string?

For example, I have a string "012345678" and want it to split it into 5 smaller strings, and this should return me something like "01", "23", "45", "67", "8".

I'm having trouble of determining the length of the smaller strings. In the previous example, the original string is size 9, and I want to split it into 5 smaller string, so each smaller string except the last one should be length 9 / 5 = 1, but then the last one will be length 9 - 1* 4 = 5, which is unacceptable.

So the formal definition of this problem: the original string is split into EXACTLY n substrings, and no two of the substrings should differ by greater than 1 in length.

My focus is not on C++ syntax or library. It's how to design an algorithm so that the returned string can be nearly-equal in size.

Shuo
  • 4,749
  • 9
  • 45
  • 63

6 Answers6

5

To divide N items into M parts, with lengths within one unit, you can use formula (N*i+N)/M - (N*i)/M as length of i'th part, as illustrated below.

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   string text = "abcdefghijklmnopqrstuvwxyz";
   int N = text.length();
   for (int M=3; M<14; ++M) {
     cout <<" length:"<< N <<"  parts:"<< M << "\n";
     int at, pre=0, i;
     for (pre = i = 0; i < M; ++i) {
       at = (N+N*i)/M;
       cout << "part " << i << "\t" << pre << "\t" << at;
       cout << "\t" << text.substr(pre, at-pre) << "\n";
       pre = at;
     }
   }
   return 0;
 } 

For example, when M is 4 or 5, the code above produces:

  length:26  parts:4
 part 0 0   6   abcdef
 part 1 6   13  ghijklm
 part 2 13  19  nopqrs
 part 3 19  26  tuvwxyz
  length:26  parts:5
 part 0 0   5   abcde
 part 1 5   10  fghij
 part 2 10  15  klmno
 part 3 15  20  pqrst
 part 4 20  26  uvwxyz
Community
  • 1
  • 1
James Waldby - jwpat7
  • 8,593
  • 2
  • 22
  • 37
4

My solution:

std::vector<std::string> split(std::string const & s, size_t count)
{
       size_t minsize = s.size()/count;
       int extra = s.size() - minsize * count;
       std::vector<std::string> tokens;
       for(size_t i = 0, offset=0 ; i < count ; ++i, --extra)
       {
          size_t size = minsize + (extra>0?1:0);
          if ( (offset + size) < s.size())
               tokens.push_back(s.substr(offset,size));
          else
               tokens.push_back(s.substr(offset, s.size() - offset));
          offset += size;
       }       
       return tokens;
}

Test code:

int main() 
{
      std::string s;
      while (std::cin >> s)
      {
        std::vector<std::string> tokens = split(s, 5);
        //output
        std::copy(tokens.begin(), tokens.end(), 
              std::ostream_iterator<std::string>(std::cout, ", "));
        std::cout << std::endl;
      }
}

Input:

012345
0123456
01234567
012345678
0123456789
01234567890

Output:

01, 2, 3, 4, 5, 
01, 23, 4, 5, 6, 
01, 23, 45, 6, 7, 
01, 23, 45, 67, 8, 
01, 23, 45, 67, 89, 
012, 34, 56, 78, 90, 

Online Demo : http://ideone.com/gINtK

This solution tends to make the tokens even, i.e all tokens may not be of same size.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • I don't think this is correct. since size = 1, you end up with 0, 1, 2, 3, 4, 5, 6... – Shuo Nov 21 '11 at 05:53
  • @Shuo: Yes. I was doubting that. Corrected it now. – Nawaz Nov 21 '11 at 05:59
  • How about "012345" split into 5 sub-strings? – Shuo Nov 21 '11 at 06:05
  • @Shuo: How do you want to split it? Your question is not clear enough. – Nawaz Nov 21 '11 at 06:08
  • Yes, I guess I didn't make my question explicit, now I've got the formal definition. – Shuo Nov 21 '11 at 06:24
  • @Shuo: This is an interesting question. Initially, I did think it would be so. I will work on it after office time, and hopefully will come up with better and elegant algorithm. There are many things to consider. I'm kind of busy doing office work. – Nawaz Nov 21 '11 at 06:26
  • This works. But if I were designing a general interface for this I would design it as an iterator style interface. That enables you to pass your list of substrings into all kinds of things without having to build a vector of them first. – Omnifarious Nov 21 '11 at 07:40
  • If I were to split "012345" into 5 sub-strings, this algorithm only returns 3 sub-strings. So it is incorrect. – Shuo Nov 22 '11 at 01:56
  • @Shuo: Now, I posted a better answer. Tested it with 6 inputs, seems to work fine! – Nawaz Nov 22 '11 at 04:49
1

It's enough to know the length of substrings;
Assume m is size() of your string:

int k = (m%n == 0)? n : n-m%n;  

Then k of substrings should be of length m/n and n-k of length m/n+1.

a-z
  • 1,634
  • 4
  • 16
  • 35
0

Try substr.

Christian Specht
  • 35,843
  • 15
  • 128
  • 182
fat
  • 6,435
  • 5
  • 44
  • 70
0

You get iterators for where you want to split it, then use those to construct new strings. For example:

std::string s1 = "string to split";
std::string::iterator halfway = s1.begin() + s1.size() / 2;
std::string s2(s1.begin(), halfway);
std::string s3(halfway, s1.end());
Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274
0

Lets's say the string length is L and it must be split in n substrings.

# Find the next multiple of `n` greater than or equal to `L`

L = 9
n = 5

LL = n * (L / n)
if LL < L:
    LL += n

# Split a string of length LL into n equal sizes. The string is at
# most (n-1) longer than L.

lengths = [(LL / n) for x in range (n)]

# Remove one from the first (or any) (LL-L) elements.
for i in range (LL-L):
    lengths [i] = lengths [i] - 1

# Get indices from lengths. 
s = 0
idx = []
for i in lengths:
    idx.append (s)
    s = s + i
idx.append (L)

print idx

EDIT: OK, OK, I forgot it should be C++.

EDIT: Here it goes ...

#include <vector>
#include <iostream>

unsigned int L = 13;
unsigned int n = 5;

int
main ()
{
  int i;
  unsigned int LL;
  std::vector<int> lengths, idx;

  /* Find the next multiple of `n` greater than or equal to `L` */
  LL = n * (L / n);
  if (LL < L)
    LL += n;

  /* Split a string of length LL into n equal sizes. The string is at
     most (n-1) longer than L. */
  for (i = 0; i < n; ++i)
    lengths.push_back (LL/n);

  /*  Remove one from the first (or any) (LL-L) elements. */
  for (i = 0; i < LL - L; ++i)
    --lengths [i];

  /* Get indices from lengths.  */
  int s = 0;
  for (auto &ii: lengths)
    {
      idx.push_back (s);
      s += ii;
    }

  idx.push_back (L);

  for (auto &i : idx)
    std::cout << i << " ";

  std::cout << std::endl;
  return 0;
}
chill
  • 16,470
  • 2
  • 40
  • 44