0

I have a struct with arrays that size are not specified.

struct KMData{
    int ndata;
    int dim;
    float **features;
    int *assigns;
    int *labels;
    int nlabels;
};

like this.

In my function below, I try to malloc the memory for features and labels depending on the size of the given file, but it seems like it is not mallocing the size of the data I specified.

struct KMData kmdata_load(char *datafile) {

    ssize_t tokens;
    ssize_t lines;

    filestats(datafile, &tokens, &lines);

    struct KMData *data = malloc(sizeof(struct KMData) + (lines * sizeof(int)) + ((tokens - (2*lines)) * sizeof(float)));
    
    printf("tokens: %zd, lines: %zd\n", tokens, lines);

    data->labels = malloc(lines * sizeof(int));
    data->features = malloc((tokens - (2*lines)) * sizeof(float));

    ssize_t f_size = sizeof(*(data->features));
    printf("size of features: %zd\n", f_size);

    FILE *fin = fopen(datafile, "r");

    char line[3150];

    int i = 0;

    while (fgets(line, 3150, fin)) {

        data->ndata++;
        char *token = strtok(line, " \t");

        data->labels[data->ndata-1] = atoi(token);

        float feats[(tokens/lines)-2];

        int f = 0;
        token = strtok(NULL, " \t");
        while ((token = strtok(NULL, " \t"))) {
            feats[f] = atof(token);
           
            data->features[data->ndata-1][f] = atof(token);

            if(i==0){
                printf("token %d: %f\n", f, data->features[data->ndata-1][f]);
            }
            f++;
        }
        i++;
        ssize_t size = sizeof(feats)/sizeof(float);
       
    }
    fclose(fin);
    
    return *data;
}

int main(int argc, char* argv[]){
  struct KMData data = kmdata_load(argv[1]);
}

Anything that I missed here?

When I print out the size of features, it gives 8, which is way smaller than what I am expecting.

after reading comments and answers, I tried doing

struct KMData *data = malloc(sizeof(struct KMData));
data->labels = malloc(lines * sizeof(int));

then

data->labels[data->ndata-1] = atoi(token);

but it gives me segmentation fault, so am I still not allocating the array correctly?

  • 2
    "I have a struct with arrays ... " Incorrect. You have a `struct` with _pointers_. Pointers and arrays are not the same thing, one difference you've so determined by printing the size. If `malloc` returns a valid pointer (not NULL), then it gave you the memory you asked for. `sizeof` won't know anything about that. The size of a pointer will be the same no matter how much memory (or none) it points to. If you want to keep track of how much memory you've allocated, you must do it "manually", `sizeof` won't help you. – yano Apr 06 '23 at 19:24
  • looks like [this](https://stackoverflow.com/questions/1641957/is-an-array-name-a-pointer) and [this](https://stackoverflow.com/questions/3959705/are-arrays-pointers) might help – yano Apr 06 '23 at 19:28
  • Small detail: `strtok(NULL, " \t")` should be `strtok(NULL, " \t\n")` because `fgets()` retains the newline. – Weather Vane Apr 06 '23 at 19:48

1 Answers1

1

When a struct contains pointers, you allocate space for just the struct. This will allocate space for the pointers, but they will point at garbage.

struct KMData *data = malloc(sizeof(struct KMData));

Then you allocate space and assign it to the pointers.

data->labels = malloc(lines * sizeof(int));

When I print out the size of features, it gives 8, which is way smaller than what I am expecting.

The struct only stores the pointer, which on a modern computer is typically 64 bits (8 bytes). That's why sizeof(data->features) is 8. You cannot get the size of the memory allocated to data->features.

Because data->features is a float **, it is a list of pointers. You allocate space for the pointers, then assign existing lists of floats. Allocate with sizeof(float*) times the number of lists you want to store. Then allocate a list of floats.

// Allocate space for a pointer to a list of floats for each line.
data->features = malloc(lines * sizeof(float*));

// Allocate space for each token (a float) in the first line
data->features[0] = malloc(tokens * sizeof(float));

malloc does not clear the allocate memory, it will contain whatever was in it before. Sometimes this is 0, often it is garbage. You also must initialize any members of the struct to be sure.

data->ndata = 0;
data->dim = 0;
data->nlabels = 0;
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • I want to make the features 2d array. If I do it the way you provided, will it become a 2d array? – Daisuke Oto Apr 06 '23 at 22:02
  • @DaisukeOto, you will be able to use two indexes to access the elements: `float x = data->features[i][j];`, but that does not make it a true 2D array. Arrays are contiguously allocated, and that will be the case only within the block allocated by each `malloc()` call, not for the data overall. – John Bollinger Apr 06 '23 at 22:46