0

This question is really close to this to this topic but I prefer the lisibility and the pointers clarification I needed offered by this solution.

So I've got a data file and I get a very long array of char from it. I want to split this string into an array with, in each case, a string wich correspond to a line of this file.
I saw solutions but they all use limited arrays, since I don't know the lenght of each line, I really need to allocate all of them dynamicly but I can't find the lenght of the lines because strtokdoesn't put a null character \0at the end of each string.

What I've got for now is this two solutions but neither work:

int get_lines(char *file, char **lines) {
    int nb_lines = 0;
    char *token = strtok(file, "\n");
    for(int i = 0; token != NULL; i++) {
        token = strtok(NULL, "\n");
        nb_lines = i;
    }
    nb_lines++;

    lines = malloc((nb_lines + 1) * sizeof(char*));
    lines[nb_lines] = '\0';

    token = strtok(file, "\n");
    for(int i = 0; token != NULL; i++) {
        token = strtok(NULL, "\n");
        int nb_char = 0;
        for(int j = 0; token[j] != '\n'; j++) //This will cause SIGSEGV because strtok don't keep the '\n' at the end
            nb_char = j;
        nb_char++;
        token[nb_char] = '\0'; //This cause SIGSEGV because token's allocation finish at [nb_char-1]
        lines[i] = malloc(strlen(token) * sizeof(char)); //strlen cause SIGSEGV because I cannot place the '\0' at the end of token
        printf("%s", token); //SIGSEGV because printf don't find the '\0'
        lines[i] = token;
    }

    for(int i = 0; i < nb_lines; i++) {
        printf("%s", lines[i]); //SIGSEGV
    }

    return nb_lines;
}

So you can see above the idea of what I want to do and why it doesn't work.

Below you will see an other try I made but I'm stuck at the same point:

int count_subtrings(char* string, char* separator) {
    int nb_lines = 0;
    char *token = strtok(string, separator);
    for(int i = 0; token != NULL; i++) {
        token = strtok(NULL, separator);
        nb_lines = i;
    }
    return nb_lines + 1;
}

char** split_string(char* string, char* separator) {
    char **sub_strings = malloc((count_subtrings(string, separator) + 1) * sizeof(char*));
    for(int i = 0; string[i] != EOF; i++) {
        //How to get the string[i] lenght to malloc them ?
    }
}

My file is quite big and the lines can be too so I don't want to malloc an other table with a size of (strlen(file) + 1) * sizeof(char) to be sure each line won't SIGSEGV and I also find this solution quite dirty, if you guys had an other idea, I would be really happy.

(Sorry for the english mistakes, I'm not really good)

Community
  • 1
  • 1
Tix'at
  • 3
  • 4
  • 1
    Possible duplicate of [Handle memory while reading long lines from a file in C](http://stackoverflow.com/questions/43779687/handle-memory-while-reading-long-lines-from-a-file-in-c) – Badda May 05 '17 at 11:30
  • You could use a dynamic linked list kind of data structure. – ctrl-shift-esc May 05 '17 at 11:30
  • check out realloc – AndersK May 05 '17 at 14:25
  • use `getline()` to read the file, line by line. Copy the resulting line pointer to the next entry in an array of `char*` Reset the line pointer back to NULL before each call to `getline()`, Suggest the array of `char*` be dynamically allocated, so can call `realloc()` when/if the array gets full. – user3629249 May 05 '17 at 16:42
  • Thanks @user3629249 but I'll use the checked awnser as I already got my file loaded in an array and I wanted to know how to split it even if your awnser works perfectly too. – Tix'at May 05 '17 at 16:51

2 Answers2

0

Your approach with strtok has two drawbacks: First, strtok modifies the string,so you can only pass the original string once. Second, it skips empty lines, because it tretas stretches of nelines as a single token separator.. (I don't know ehether that is a concern to you.)

You can countthe newlines with a single pass through the string. Allocate memory for your line array and make a second pass, where you split the string at newlines:

char **splitlines(char *msg)
{
    char **line;
    char *prev = msg;
    char *p = msg;

    size_t count = 0;
    size_t n;

    while (*p) {
        if (*p== '\n') count++;
        p++;
    }

    line = malloc((count + 2) * sizeof(*line));
    if (line == NULL) return NULL;

    p = msg;
    n = 0;
    while (*p) {
        if (*p == '\n') {
            line[n++] = prev;
            *p = '\0';
            prev = p + 1;
        }

        p++;
    }

    if (*prev) line[n++] = prev;
    line[n++] = NULL;

    return line;
}

I've allocated two more line pointers than the newlines count: One for the case that the last line doesn't end with a newline and another one to place a NULL sentinel at the end, so that you know where yourarray ends. (You could, of course, return the actual line count via a pointer to a size_t.)

M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • First of all, thanks you for your help, your algorithm is so clean, I really like it. I've got a few questions, why do you only use `size_t` variables to go through arrays ? What do you mean by "return the actual line count via a pointer to a `size_t`", how does this works ? Why don't the second `while (*p)` breaks if `*p` is a `0` ? In `line[n++] = prev;`, `n` is incremented after the instruction `line[n] = prev;` ? And are the failures of `malloc()` something I should really care about in all my code ? – Tix'at May 05 '17 at 16:01
  • (1) `size_t` is an unsigned integer type; the standard library also uses it for things that can't be negative such as the value returned by `strlen`. You can use `int`, if you like. (2) make the function `f(char *s, size_t *pn)` and then, before returning, say `if (pn) *pn = n;` (I've made a blunder there -- the line before returning should be `lin[n] = NULL`, without increment.) – M Oehm May 05 '17 at 16:23
  • (3) `line[n++] = x` is a typical C idiom. Remember that an array of `n` items has valid indices from 0 to `n - 1`. Item `n` is the item just after the valid range. When appending to an array, assign to that field and increment the count. (4) Yes, you should. In the current code, the calling function should check that the returned pointer isn't null. You could also abort the program when allocation fails for a quick solution. – M Oehm May 05 '17 at 16:26
  • Ok thanks, (1) so `size_t` or `unsigned int` is just a matter of preference ? (2) I didn't knew you can call a function without all its arguments. (3) I allways used `for (int i=0; array[i] != '\0'; i++) array[i] = x` to go through an array, a `while (array[i] != '\0') array[i++] = x` is the same or it have advantages ? (4) I'll be more careful then. (5) You forgot this question: _Why don't `while (*p)` breaks if `*p` is `0` ?_ – Tix'at May 05 '17 at 16:45
  • This space really isn't suited to answering many long questions. (1) For counts, prefer unsigned types. (2) You've got that wrong; that would be another function. (3) It depends. In my case I could have used a `for` loop. When the pointer advances conditionally or with different strides, use `while`. (5) I don't understand your question. `if (*p)` is the same as ´if (*p == '\0')`. – M Oehm May 05 '17 at 17:09
0

the following proposed code:

  1. cleanly compiles
  2. (within the limits of the heap size) doesn't care about the input file size
  3. echo's the resulting array of file lines, double spaced, just to show it worked. for single spacing, replace the puts() with printf()

and now the code

#include <stdio.h>   // getline(), perror(), fopen(), fclose()
#include <stdlib.h>  // exit(), EXIT_FAILURE, realloc(), free()


int main( void )
{
    FILE *fp = fopen( "untitled1.c", "r" );
    if( !fp )
    {
        perror( "fopen for reading untitled1.c failed" );
        exit( EXIT_FAILURE );
    }

    // implied else, fopen successful

    char **lines = NULL;
    size_t availableLines = 0;
    size_t usedLines = 0;

    char *line = NULL;
    size_t lineLen = 0;
    while( -1 != getline( &line, &lineLen, fp ) )
    {
        if( usedLines >= availableLines )
        {
            availableLines = (availableLines)? availableLines*2 : 1;
            char **temp = realloc( lines, sizeof( char* ) * availableLines );
            if( !temp )
            {
                perror( "realloc failed" );
                free( lines );
                fclose( fp );
                exit( EXIT_FAILURE );
            }

            // implied else realloc successful

            lines = temp;
        }

        lines[ usedLines ] = line;
        usedLines++;
        line = NULL;
        lineLen = 0;
    }

    fclose( fp );

    for( size_t i = 0; i<usedLines; i++ )
    {
        puts( lines[i] );
    }

    free( lines );
}

Given the above code is in a file named: untitled1.c the following is the output.

#include <stdio.h>   // getline(), perror(), fopen(), fclose()

#include <stdlib.h>  // exit(), EXIT_FAILURE, realloc(), free()





int main( void )

{

    FILE *fp = fopen( "untitled1.c", "r" );

    if( !fp )

    {

        perror( "fopen for reading untitled1.c failed" );

        exit( EXIT_FAILURE );

    }



    // implied else, fopen successful



    char **lines = NULL;

    size_t availableLines = 0;

    size_t usedLines = 0;



    char *line = NULL;

    size_t lineLen = 0;

    while( -1 != getline( &line, &lineLen, fp ) )

    {

        if( usedLines >= availableLines )

        {

            availableLines = (availableLines)? availableLines*2 : 1;

            char **temp = realloc( lines, sizeof( char* ) * availableLines );

            if( !temp )

            {

                perror( "realloc failed" );

                free( lines );

                fclose( fp );

                exit( EXIT_FAILURE );

            }



            // implied else realloc successful



            lines = temp;

        }



        lines[ usedLines ] = line;

        usedLines++;

        line = NULL;

        lineLen = 0;

    }



    fclose( fp );



    for( size_t i = 0; i<usedLines; i++ )

    {

        puts( lines[i] );

    }



    free( lines );

}
user3629249
  • 16,402
  • 1
  • 16
  • 17