0

I wanted to create a function that will split a string by a delimiter.. I know there's already a function that does this thing but I wanted to make on my own.. But it doesn't work as it should.

char** Engine::splitString(const char* text, char delimiter)
{

  char** splitted;  
  splitted = (char**)malloc(50 * sizeof(char*));
  for (int y = 0; y < 50; y++)
    splitted[y] = (char*)malloc((strlen(text) + 2) * sizeof(char));

  int delimiterPosition[50];
  int arrayLength = 0;
  int f = 0;
  int g = 0;
  for (int x = 0; x < strlen(text); x++)    
  {
    if (text[x] == delimiter)
    {
        delimiterPosition[f] = x;
        f++;
    }
  }

  for (int x = 0; x < 50; x++)
    if (delimiterPosition[x] > 0 )
        arrayLength++;


  while (g < arrayLength) {
     if (g == 0) {
        for (int y = 0; y < delimiterPosition[0]; y++)
        {
            splitted[g][y] = text[y];
        }
    }
    else if(g > 0)
    {
        for (int y = delimiterPosition[g - 1]; y < delimiterPosition[g] - delimiterPosition[g - 1]; y++)
        {
            splitted[g][y] = text[y];
        }
    }
    g++;
}
return splitted;
}

First of all, I declared a two dimensional char array -> splitted. This was the variable that I should store my results into. Then I allocated a memory for it.. I wanted to have 50 words maximum. After that I created integer array.. this served as a storage for delimiters' positions. I also defined some variables below it for my code. Then I looped through the text to see if there's any delimiter.. if yes, I wanted to store it's position to a certain position in array, starting off from 0. I looped through delimiterPosition's array to how many positions I have stored. Then I made a simple loop using while to take all the characters up to the delimiter's position and store them to splitted[g][y] .. g represents the whole word.. y represents the character in that word. If g was greater than zero, I tok the previous position of a delimiter and then substracted the current from the previous.. and that gave me the distance between the first delimiter and the next..

The main problem here is that the first word is written correctly, the second one is not working, but it has some weird characters behind it when I try to call it.. is the text somehow leaking? the second one isn't being stored at all?:

    char** strings = en.splitString("Hello;boy", ';');
    printf("%s", strings[1]);

First word: enter image description here

Second: enter image description here

Any solutions, guys ? :) Thank you for any comment.

MerryGR
  • 99
  • 1
  • 6
  • 3
    This looks more like C code, not C++. – PaulMcKenzie Apr 08 '20 at 18:06
  • 1
    I assume you are not permitted to write proper `c++` code. – drescherjm Apr 08 '20 at 18:06
  • 1
    Where are you learning this? You're writing "C with classes" which is not proper C++. In C++ you should be using `std::string` instead of char arrays, and `new` instead of `malloc`. You should get a [good book](https://stackoverflow.com/q/388242/9254539). – eesiraed Apr 08 '20 at 18:08
  • 1
    You should not even be using `new` since at least 2011 or at least avoid it. – drescherjm Apr 08 '20 at 18:08
  • 2
    If this code had any resemblance to C++, the return type would be `std::vector`, not `char **`. The strings themselves would be `std::string`, not `malloc()` stuff. – PaulMcKenzie Apr 08 '20 at 18:09
  • Your calculation of `arrayLength` is incorrect becuse it replies on values in the `delimiterPosition` array that have not been initialised. – john Apr 08 '20 at 18:13
  • @john Thank you, I will try it.. – MerryGR Apr 08 '20 at 18:14
  • It's also not clear if `arrayLength` is meant to have a different value than `f`. Seems to me you could get rid of one of those variables. – john Apr 08 '20 at 18:15
  • if this is for exercise and not using the existing function on purpose I would advice to concentrate on one thing that you are not using, because if you are not using `std::string`, not `std::vector` and no algorithms you arent really using c++ – 463035818_is_not_an_ai Apr 08 '20 at 18:17
  • When you copy characters to the `splitted` array you fail to copy a nul terminator. As everyone has already said, this code would be a lot easier if you used a bit more C++. – john Apr 08 '20 at 18:18
  • Thank you guys.. and yes, it was a part of the exercise, nothing more.. I thought I could re-create that function somehow.. I must have missed some things, but thank you for you reactions.. – MerryGR Apr 08 '20 at 18:18
  • oof, `malloc`? `char**`? nooo! – Asteroids With Wings Apr 08 '20 at 18:23
  • To be clear, when the people above say "this isn't C++", that is not true. It absolutely is C++. It's just not _idiomatic_ C++. It's not the preferred way to write it. Those people are just really trying to hammer that point home. – Asteroids With Wings Apr 08 '20 at 18:23
  • @Merry-Go-Rounds *I know there's already a function that does this thing* -- If you're referring to `strtok`, even that function has major drawbacks. There are better C++ ways than even with `strtok`. – PaulMcKenzie Apr 08 '20 at 18:24
  • @AsteroidsWithWings I thought it does't matter which command I use where as long as I include them.. if I am mistaken, you should better correct me. – MerryGR Apr 08 '20 at 18:27
  • @Merry-Go-Rounds Sorry, I didn't understand your comment. – Asteroids With Wings Apr 08 '20 at 18:31
  • @AsteroidsWithWings I mean I could include stdio.h even in c++.. i think it doesn't matter. – MerryGR Apr 08 '20 at 18:33
  • 1
    @Merry-Go-Rounds It works and is legal. It's just not good style. The "old" facilities inherited from C are considered error-prone and cumbersome. When you're writing C++, you "should" use more modern, type-safe functionality. That's what they're saying. – Asteroids With Wings Apr 08 '20 at 18:39

1 Answers1

2

This does not initialize the memory:

 int delimiterPosition[50];

So its content is potentially random (and its undefined to read from unless you initialize it first). So here:

 if (delimiterPosition[x] > 0 ) // Is potentially invalid if x >= f

Easily solved with:

 int delimiterPosition[50] = {0};

Potential for overflow here:

        delimiterPosition[f] = x;
        f++;

You don't validate that f remains in the correct range (less than 50). Another easy fix:

  size_t stringLen = strlen(text); // Don't need to recalculate this each time!
  for (int x = 0; f < 50 && x < stringLen; x++)    
  {
    if (text[x] == delimiter)
    {
        delimiterPosition[f] = x;
        f++;
    }
  }

Here is the problem you are complaining about:


    for (int y = 0; y < delimiterPosition[0]; y++)
    {
        splitted[g][y] = text[y];
    }

You copy the string.
But you don't add a terminator to the string. So when you try and print it you see all the extra characters on the end.

    for (int y = 0; y < delimiterPosition[0]; y++)
    {
        splitted[g][y] = text[y];
    }
    splitted[g][y] = '\0';  // Add string terminator.

For the second a subsequent string you have the null terminator problem. But you also have the issue that you are copying the string to not the beginning.

        // After the first string the value of y in an offset into text only
        // So when used with `splitted[g]` you are offset from the beginning
        // if the string.
        splitted[g][y] = text[y];

Also your test for the end of the string is wrong:

Remember you start at:

int y = delimiterPosition[g - 1]

So y is an offset into the string. So as you increase it it will always be an offset not a length.

// So this test is wrong (you are using a length not an offset.
y < delimiterPosition[g] - delimiterPosition[g - 1]

Lets fix both at the same time:

    int dstIndex = 0;
    for (int y = delimiterPosition[g - 1]; y < delimiterPosition[g]; y++, dstIndex++)
    {
        splitted[g][dstIndex] = text[y];
    }
    splitted[g][dstIndex] = '\0';
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • _"You don't validate that f remains in the correct range (less than 50). Another easy fix:"_ Your easy fix can still leave `f` at 50. It's just that that's fine because in that case it's never used again ;) – Asteroids With Wings Apr 08 '20 at 18:49
  • Thank you, it partly helped.. but, maybe I missed something what you wrote.. but it wasn't complete.. the first word was fixed, the second one wasn't.. added a few conditions, extended my code, debugged it and now it works fine.. @Martin York and Ateroids With Wings - Thank you both, guys. – MerryGR Apr 08 '20 at 19:30