-1

I'm trying to use strsep to remove extra characters in an CSV file. The problem is that when I run it, it gives me Segmentation Fault and I can't figure out why. Here's the code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <ctype.h>

typedef struct {

    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
        
} *User;

void checkUsersFile();
FILE *createCSV();
void corrigirFicheiro();
User criarUser();

int count = 0;

void checkUsersFile() {
    //Ficheiro "users.csv"
    FILE *file = fopen("ficheirosG1/users-set2.csv", "r");
    
    //Verifica se o ficheiro "users.csv" existe
    if(!file) {
        printf("Ficheiro não encontrado");
        return;
    }

    //Cria ficheiro "users-ok.csv"
    FILE *newFile = createCSV("users-ok.csv");

    corrigirFicheiro(file, newFile);

    printf("%d\n", count);
}

//Cria e retorna ficheiro "users-ok.csv"
FILE *createCSV(char *nome) {
    FILE *file = fopen(nome, "w");
    return file;
}

//Função responsável por intrepretar o ficheiro "users.csv" e colocar os dados corretos no ficheiro "users-ok.csv"
void corrigirFicheiro(FILE *file, FILE *newFile) {

    //imprimirPrimeiraLinha(file, newFile);

    char string[200000];
    //Uma linha do ficheiro com, no máximo, 200.000 caracteres

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            //1. Criar user
            //2. Print user

            User user = criarUser(&string);
            if (user != NULL) {
                printf("ok\n");
            }
            free(user);
            
        }
    }

    //free(string);

}

//Cria um User a partir de uma linha do ficheiro
User criarUser(char *str) {
    
    User novoUser;

    novoUser = (User) malloc(sizeof(User));
    
    for(int i = 0; i<10; i++) {

        //char *a = strdup(strsep(&str, ";"));
        //char *b = strdup(strsep(&a, "\n"));
        char *p = strsep(&str, ";\n\r");

        if (strlen(p) == 0) {
            count++;
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}


int main(){
    checkUsersFile();

    return 0;
}

Using gdb to debug the code, it says that it occurs in the line if(strlen(p) == 0 { So it doesn't even enter the switch case. I don't know why this is happening.

Thank you

TelmoFP1
  • 1
  • 1
  • 3
    Please create a [mcve]. In particular we need to see how `criarUser` is called. – Nate Eldredge Nov 09 '21 at 18:43
  • Is it possible you're calling this function with NULL? Also, as a general guideline to debugging such bugs, try removing or commenting out code, until you find the minimal code that still has the bug, and removing something will stop the bug from happening. – Eran Zimmerman Gonen Nov 09 '21 at 18:44
  • I added another part for clarification. ```User *novoUser``` still doesnt work – TelmoFP1 Nov 09 '21 at 18:46
  • 2
    Do not hide pointer nature behind a `typedef`, as you are doing with `User`. This tends to confuse much more than it helps, and in this case, that appears to be a contributing factor. – John Bollinger Nov 09 '21 at 18:49
  • 1
    Also, do not cast the result of `malloc()` in C. Doing so can interfere with your compiler diagnosing programming errors. Or in this case, it gives a false impression that such an error is being disguised. – John Bollinger Nov 09 '21 at 18:51
  • @JohnBollinger I didn't understand what you tried to say, can you more specific? – TelmoFP1 Nov 09 '21 at 18:52
  • `char string[200000];` You cannot use an array with `strsep()` [Why doesn't strsep() work with pointers to the stack?](https://stackoverflow.com/questions/55543717/why-doesnt-strsep-work-with-pointers-to-the-stack/55543813?r=SearchResults&s=1|52.6586#55543813) – David C. Rankin Nov 09 '21 at 18:52
  • 3
    Your `User` is a `typedef` alias for a pointer-to-structure type. Where that alias is used, it is not clear that it represents a pointer type. If you want to use a `typedef` (which is *always* optional) then `typedef` the structure type, not a pointer to it. But if you are inexperienced with C then I would recommend avoiding `typedef`. Instead, declare your structure types with tags (`struct user { ... };`) and use them without a typedef. – John Bollinger Nov 09 '21 at 18:54
  • @DavidC.Rankin what can I use instead? – TelmoFP1 Nov 09 '21 at 18:56
  • `char string = malloc (200000);` will create a block of memory with *allocated storage duration* that will work with `strsep()` -- you must free the memory when done with it. You also need to pass the address of the pointer instead of a copy-of-a-pointer. From the man-page "This function modifies its first argument." -- if you pass a copy-of-the-pointer -- it modifies a copy and on return the changes are lost. – David C. Rankin Nov 09 '21 at 18:57
  • I agree with @JohnBollinger. Some minutes ago I left a comment about malloc, then I realized it was wrong, so I've deleted it. I was tricked by that typedef. I thought `user` was a struct, instead it's a *pointer* to a struct. – Fabio says Reinstate Monica Nov 09 '21 at 19:00
  • In other words you must call `User user = criarUser(&string);` passing `char **` to `criarUser()` not `char *`. – David C. Rankin Nov 09 '21 at 19:04
  • @DavidC.Rankin did what you suggested, the segmentation fault still pops up – TelmoFP1 Nov 09 '21 at 19:06
  • See [Is it a good idea to typedef pointers?](https://stackoverflow.com/q/750178/15168), to which the TL;DR answer is "Generally no, except perhaps for function pointers". – Jonathan Leffler Nov 09 '21 at 19:07
  • Do you have a short [MCVE](http://stackoverflow.com/help/mcve) that you can post so we can enlist the aid of a compiler without having to write a program to test your code? – David C. Rankin Nov 09 '21 at 19:08
  • `criarUser()` returns `NULL` or a not initialized area the size of the `User` struct. Is it really what you want? It returns `NULL` unless the CSV line has at least 10 fields? But no data is returned inside `User`... – arfneto Nov 09 '21 at 19:08
  • My problem is not with the typedef, I'm having problems with the strsep. The typedef was always there, I have to use it, it does not affect any function at all. – TelmoFP1 Nov 09 '21 at 19:09
  • You changed `char *p = strsep(str, ";\n\r");` as well right? – David C. Rankin Nov 09 '21 at 19:10
  • I tried to make a short MCVE, it runs and it gives Segmentation Fault. – TelmoFP1 Nov 09 '21 at 19:20
  • Your revised code does not segfault for me. Running it under valgrind does not give me any reason to think that it would do for you. Additionally, I don't think you've really caught the spirit of the M (minimal) part of "MCVE". – John Bollinger Nov 09 '21 at 19:35
  • Please [edit] your question and **add** your new [mre], my attempt does not show your problem. – the busybee Nov 09 '21 at 19:35
  • @thebusybee It doesn't return Segmentation Fault to you? When I run this code it does to me – TelmoFP1 Nov 09 '21 at 19:37
  • 1
    I am inclined to think that whatever program is segfaulting for you is not built from the code currently presented in the question. Perhaps, for example, you have forgotten to recompile, or perhaps you are simply running the wrong executable. – John Bollinger Nov 09 '21 at 19:40
  • I reduced your example to a 20 line source, giving it different strings, on the stack BTW, and looking at the returned values. `strsep()` returns the original pointer, and that is non-null. So `strlen()` gets a non-null pointer, actually I'm just printing the return value. – the busybee Nov 09 '21 at 19:41
  • Since I don't have a solution for you, I cannot show you my attempt, because it is no answer. Anyway, it is your task to reduce the issue to its minimal form. – the busybee Nov 09 '21 at 19:46
  • 1
    A [Mcve] needs to include the input file; otherwise nobody can test your program and it isn't *reproducible*. Or better, if the file handling is not related to the bug, remove it and call the offending function with a hardcoded string that causes the problem (thus making the example more *minimal*). – Nate Eldredge Nov 09 '21 at 19:52

2 Answers2

2

I see no reason to think that the strsep() call is responsible for the error you encounter.

This is wrong, however:

    User novoUser = (User) malloc(sizeof(User));

and it very likely is responsible for your error.

User is a pointer type, so sizeof(User) is the size of a pointer, which is not large enough for a structure of the kind that a User points to. When you later try to assign to the members of the structure to which it points (omitted) or to access them in printUser() (also omitted), you will overrun the bounds of the allocated object. That's exactly the kind of thing that might cause a segfault.

An excellent idiom for expressing an allocation such as that uses the receiving variable to establish the amount of space to allocate:

    User novoUser = malloc(sizeof(*novoUser));

Note that I have also removed the unneeded cast.


As I expressed in comments, however, it is poor style to hide pointer nature behind a typedef, as your User does, and personally, I don't much care even for most typedefs that avoid that pitfall.

Here is how you could do it with a better typedef:

typedef struct {
    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
} User;  // not a pointer

// ...

User *criarUser(char *str) {
    // ...
    User *novoUser = malloc(sizeof(*novoUser));  // Note: no change on the right-hand side
    // ...
    return novoUser;
}

But this is how I would do it, without a typedef:

struct user {
    int id, followers, following, public_gists; 
    int public_repos;
    char *login;
    char *type;
    char *created_at;
    int *follower_list;
    int *following_list;
};

// ...

struct user *criarUser(char *str) {
    // ...
    struct user *novoUser = malloc(sizeof(*novoUser));  // still no change on the right-hand side
    // ...
    return novoUser;
}
John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • It did not work – TelmoFP1 Nov 09 '21 at 19:34
  • 1
    What "it", @TelmoFP1? There is no question that the allocation in your original code is insufficient. If fixing that does not resolve your segfault then the problem is not captured by the code you've presented. – John Bollinger Nov 09 '21 at 19:37
1

In answer to you question I wrote a short example that illustrates what is needed. In essence you need to allocate storage for string and pass the address of string to criarUser(). You cannot use an array because the type passed to criarUser() would be pointer-to-array not pointer-to-pointer. (note: you can use the array so long as it is allowed to decay to a pointer so taking the address does not result in pointer-to-array -- example at end below)

An (close) example of corrigirFicheiro() with needed changes to use with allocated storage is:

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char *string = malloc (200000);           /* allocate for string */
    /* valdiate allocation here */
    char *original_ptr = string;              /* save original pointer */

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(&string);  /* pass address of string */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
    free (original_ptr);                      /* free original pointer */
}

An abbreviated struct was used in the following example, but the principal is the same. For sample input, you can simply pipe a couple of lines from printf and read on stdin and write to stdout. I used:

$ printf "one;two;3\nfour;five;6\n" | ./bin/strsep_example

A short MCVE (where I have taken liberty to remove the typedef'ed pointer) would be

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC      128
#define DELIM     ";\r\n"
#define NFIELDS   3

typedef struct {
  char login[MAXC];
  char type[MAXC];
  int public_repos;
} User;

void printUser (User *user, FILE *newFile)
{
  fprintf (newFile, "\nlogin : %s\ntype  : %s\nrepo  : %d\n",
            user->login, user->type, user->public_repos);
}

User *criarUser(char **str)
{
    User *novoUser = malloc(sizeof *novoUser);
    /* validate allocation here */
    
    for(int i = 0; i<NFIELDS; i++) {

        char *p = strsep(str, DELIM);
        switch (i) {
          case 0: strcpy (novoUser->login, p);
            break;
          case 1: strcpy (novoUser->type, p);
            break;
          case 2: novoUser->public_repos = atoi(p);
            break;
        }
        if (strlen(p) == 0) {
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char *string = malloc (200000);           /* allocate for string */
    /* valdiate allocation here */
    char *original_ptr = string;              /* save original pointer */

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(&string);  /* pass address of string */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
    free (original_ptr);                      /* free original pointer */
}

int main (int argc, char **argv) {
  
  /* use filename provided as 1st argument (stdin by default) */
  FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

  if (!fp) {  /* validate file open for reading */
    perror ("file open failed");
    return 1;
  }
  
  corrigirFicheiro(fp, stdout);
  
  if (fp != stdin)   /* close file if not stdin */
    fclose (fp);
}

Example Use/Output

$ printf "one;two;3\nfour;five;6\n" | ./bin/strsep_example

login : one
type  : two
repo  : 3

login : four
type  : five
repo  : 6

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ printf "one;two;3\nfour;five;6\n" | valgrind ./bin/strsep_example
==6411== Memcheck, a memory error detector
==6411== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6411== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==6411== Command: ./bin/strsep_example
==6411==

login : one
type  : two
repo  : 3

login : four
type  : five
repo  : 6
==6411==
==6411== HEAP SUMMARY:
==6411==     in use at exit: 0 bytes in 0 blocks
==6411==   total heap usage: 5 allocs, 5 frees, 205,640 bytes allocated
==6411==
==6411== All heap blocks were freed -- no leaks are possible
==6411==
==6411== For counts of detected and suppressed errors, rerun with: -v
==6411== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have questions.

Using An Array

When you pass the array as a parameter it decays to a pointer by virtue of array/pointer conversion. If that is done and the "array" type is no longer associated with the pointer you can then take the address and use automatic storage with strsep() as pointed out by @thebusybee.

The changes to the program above to do so would be:

User *criarUser(char *str)
{
    User *novoUser = malloc(sizeof *novoUser);
    /* validate allocation here */
    
    for(int i = 0; i<NFIELDS; i++) {

        char *p = strsep(&str, DELIM);
        switch (i) {
          case 0: strcpy (novoUser->login, p);
            break;
          case 1: strcpy (novoUser->type, p);
            break;
          case 2: novoUser->public_repos = atoi(p);
            break;
        }
        if (strlen(p) == 0) {
            free(novoUser);
            return NULL;
        }
            
    }

    return novoUser;
}

void corrigirFicheiro(FILE *file, FILE *newFile)
{

    char string[200000];

    while ((fgets(string, 200000, file))) {
        if (string[0] != '\0') {
            User *user = criarUser(string);  /* pass string, as pointer */
            if (user != NULL) {
                printUser(user, newFile);
            }
            free(user);
        }
    }
}

But note, without the benefit of array/pointer conversion when passing string as a parameter, you must use allocated storage. Up to you, but know the caveat.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • `strsep()` works perfectly with character arrays on the stack. In the OP's source it is not called with the address of that array, but with the address of a pointer to the array. Both array and its address are objects on the stack, and this works for me. – the busybee Nov 09 '21 at 19:45
  • Right you are, you can pass a string with automatic storage duration and take it's address for `strsep()` if you pass though a parameter as is done here to hide the original type and prevent `&str` from being a pointer-to-array,, but then you have to hide the original type. If you always work with allocated storage for `strsep` you eliminate that issue. But point taken. – David C. Rankin Nov 09 '21 at 19:50
  • @thebusybee Added additional section to reflect the same. – David C. Rankin Nov 09 '21 at 19:55
  • Regarding "You cannot use an array because the type passed to criarUser() would be pointer-to-array not pointer-to-pointer" -- of course they can use an array. They ought to pass the array itself (which will decay to a pointer), but even passing the address of the array works because `char *` can be used freely in place of `void *` for function arguments (C17 6.2.5/28 and footnote 48) and the resulting conversion has the correct effect. – John Bollinger Nov 09 '21 at 20:21
  • I added a section, but will update the intro as well. – David C. Rankin Nov 09 '21 at 20:35