Making a 2d array from a text file

Question

Im working with a sparse matrix and im given a text file like this:

Basically the way it works is the number 1.2 goes on the position [0] [3] and the elements of the matriz that are not mentioned stay at 0, so in this case it should look like this:

5.2 0 0 1.2 0 0
0 0 0 0 0 0
0 0 0 0 0 3.2
2.1 0 0 0 0 4.2
0 0 0 0 0 2.2

When you say you don't know the size, does that only applies to the rows or does it apply to the columns as well? — Pablo, Apr 18 '18 at 21:43
1)allocate enough for 10000 elements 2) read the file 3) realloc to a smaller size. — wildplasser, Apr 18 '18 at 21:44
What do the individual lines mean. Also is the matrix square? Once you have the answer to these questions how would you start to solve the problem? — David Harris, Apr 18 '18 at 21:46
Pablo i dont know the number of rows or comuns, so i cant declare values for both of them at the at the start of my code — MuchoG, Apr 18 '18 at 21:47
Ok, so you don't know the number of columns either. Do all rows have the same number of rows? — Pablo, Apr 18 '18 at 21:49
Possible duplicate of [Reading matrix from a text file to 2D integer array C++](https://stackoverflow.com/questions/15588800/reading-matrix-from-a-text-file-to-2d-integer-array-c) — , Apr 18 '18 at 21:49
@xorz57 this is a C question, a C++ answer is not a possible duplicate. — Pablo, Apr 18 '18 at 21:51
read the file, figure out the size, then `fseek` back to the beginning of the file, and store the data. — bruceg, Apr 18 '18 at 21:52
Im so sorry but my teacher clarified everything just now... it turns out that, for each line, the first number is the row, the second the column and the third the element.with the example i have above, 1.2 has to go in the position [0] [3]. The matrix does not have to be square — MuchoG, Apr 18 '18 at 21:57
That's a completely different problem. Please edit your question with this clarification. — Pablo, Apr 18 '18 at 21:58
If this is truly the contents of your file, then you don't have enough information. In this case above your matrix could be 6x6, 10x6, 100x100, etc. All of these are different. — MFisherKDX, Apr 18 '18 at 21:58
I edited to clarify the changes, its a little diferent from what i was asking in the first question — MuchoG, Apr 18 '18 at 22:17
The whole point of the topic of "sparse matrices" is to avoid building the whole matrix, but instead use special methods to do the manipulation. These methods would be inefficient normally, but are much more efficient for sparse matrices. So, it's probably a bad idea to create a 2D array from a specification of a sparse matrix, unless the teacher has explicitly said to do so as a step in the learning process. — Jeff Learman, Apr 18 '18 at 23:48
@JeffLearman Do you think i should create a scruct for a element of the sparse matrix thats not 0? Like int row, int column and double value. Maybe it would be better than dealing with a large matrix with a lot of zeros — MuchoG, Apr 19 '18 at 00:42
@MuchoG: Yes, that's usually the whole point of treating sparse matrices as a special class. It depends on what you're supposed to do next with the matrix, and what the teacher intends to teach in this exercise. Read the wikipedia article, which mentions several different types of data structures often used. The input file is a Coordinate List, which is the easiest to understand — Jeff Learman, Apr 20 '18 at 15:37

score 3 · Accepted Answer · answered Apr 18 '18 at 22:46

OP wrote this in the comments:

Im so sorry but my teacher clarified everything just now... it turns out that, for each line, the first number is the row, the second the column and the third the element.with the example i have above, 1.2 has to go in the position [0][3]. The matrix does not have to be square.

This makes every thing different. If you don't know the dimensions of the matrix, then you have to read everything first, then calculate the matrix dimensions, allocate space for the matrix and then fill it with the values.

I'd do this:

#include <stdio.h>
#include <stdlib.h>

#define BLOCK 1024

struct matrix_info {
    int col;
    int row;
    double val;
};

void free_matrix(double **matrix, size_t rows)
{
    if(matrix == NULL)
        return;

    for(size_t i = 0; i < rows; ++i)
        free(matrix[i]);
    free(matrix);
}

double **readmatrix(const char *fname, size_t *rows, size_t *cols)
{
    if(fname == NULL || rows == NULL || cols == NULL)
        return NULL;

    double **matrix = NULL;
    struct matrix_info *info = NULL;
    size_t mi_idx = 0; // matrix info index
    size_t mi_size = 0;

    FILE *fp = fopen(fname, "r");
    if(fp == NULL)
    {
        fprintf(stderr, "Cannot open %s\n", fname);
        return NULL;
    }

    *rows = 0;
    *cols = 0;

    for(;;)
    {
        if(mi_idx >= mi_size)
        {
            struct matrix_info *tmp = realloc(info, (mi_size + BLOCK) * sizeof *info);
            if(tmp == NULL)
            {
                fprintf(stderr, "not enough memory\n");
                free(info);
                fclose(fp);
                return NULL;
            }

            info = tmp;
            mi_size += BLOCK;
        }

        int ret = fscanf(fp, "%d %d %lf", &info[mi_idx].row, &info[mi_idx].col,
                    &info[mi_idx].val);

        if(ret == EOF)
            break; // end of file reached

        if(ret != 3)
        {
            fprintf(stderr, "Error parsing matrix\n");
            free(info);
            fclose(fp);
            return NULL;
        }

        if(*rows < info[mi_idx].row)
            *rows = info[mi_idx].row;

        if(*cols < info[mi_idx].col)
            *cols = info[mi_idx].col;

        mi_idx++;
    }

    fclose(fp);

    // mi_idx is now the length of info
    // *cols and *rows have the largest index
    // for the matrix, hence the dimension is (rows + 1) x (cols + 1)
    (*cols)++;
    (*rows)++;

    // allocating memory

    matrix = calloc(*rows, sizeof *matrix);
    if(matrix == NULL)
    {
        fprintf(stderr, "Not enough memory\n");
        free(info);
        return NULL;
    }

    for(size_t i = 0; i < *rows; ++i)
    {
        matrix[i] = calloc(*cols, sizeof **matrix);
        if(matrix[i] == NULL)
        {
            fprintf(stderr, "Not enough memory\n");
            free(info);
            free_matrix(matrix, *rows);
            return NULL;
        }
    }

    // populating matrix

    for(size_t i = 0; i < mi_idx; ++i)
    {
        int r,c;
        r = info[i].row;
        c = info[i].col;
        matrix[r][c] = info[i].val;
    }

    free(info);
    return matrix;
}

int main(void)
{
    const char *fn = "/tmp/matrix.txt";

    size_t rows, cols;

    double **matrix = readmatrix(fn, &rows, &cols);

    if(matrix == NULL)
        return 1;

    for(size_t i = 0; i < rows; ++i)
    {
        for(size_t j = 0; j < cols; ++j)
            printf("%0.3f ", matrix[i][j]);

        puts("");
    }

    free_matrix(matrix, rows);
    return 0;
}

The output is (for a file with your sample data)

5.200 0.000 0.000 1.200 0.000 0.000 
0.000 0.000 0.000 0.000 0.000 0.000 
0.000 0.000 0.000 0.000 0.000 3.200 
2.100 0.000 0.000 0.000 0.000 4.200 
0.000 0.000 0.000 0.000 0.000 2.200

So a quick explanation of what I'm doing:

I read the file and store in an dynamically allocated array the information about the column, the row and the value. This information is stored in the struct matrix_info *info.

The idea is that I read every line and extract the three values. While I read the file, I also store the largest index for the column and the row

    ...

    if(*rows < info[mi_idx].row)
        *rows = info[mi_idx].row;

    if(*cols < info[mi_idx].col)
        *cols = info[mi_idx].col;

    ...

so when the file is read, I know the dimensions of the matrix. Now all values with their row & column are stored in the info array, so the next step is to allocate memory for the matrix and fill the values based based on the info[i] entries.

for(size_t i = 0; i < mi_idx; ++i)
{
    int r,c;
    r = info[i].row;
    c = info[i].col;
    matrix[r][c] = info[i].val;
}

At the end I free the memory for info and return the matrix.

Another interesting part is this:

    if(mi_idx >= mi_size)
    {
        struct matrix_info *tmp = realloc(info, (mi_size + BLOCK) * sizeof *info);
        if(tmp == NULL)
        {
            fprintf(stderr, "not enough memory\n");
            free(info);
            fclose(fp);
            return NULL;
        }

        info = tmp;
        mi_size += BLOCK;
    }

Because you mentioned that the only thing you know about the matrix is that it might contain up to 10000 elements, then the input file might be very big. Instead of reallocating memory for the info elements on every loop, I allocate chunks of 1024 (BLOCK) info elements at a time. Thus once a block is full, the next block is allocated and so on. So I call realloc only every 1024 iterations.

I understand what you are doing @Pablo, but it really makes no sense to represent a sparse matrix this way -- ie: as a dense matrix. I suspect the OP is still missing something from the assignment. — MFisherKDX, Apr 18 '18 at 22:53
@MFisherKDX without the proper specification, guesses have to be made. I decided not to make any guess as to how large the dimensions of the matrix are, so I need to read all lines before I can determine the size of the matrix. This is the more general approach. Of course, I you know that upper bound of the matrix (or other properties of the matrix), you can simplify the code. And we don't know if the OP's matrix is a sparse one. The input file might be very large and contain a value for all matrix indices, the OP might have chosen to show us only a few lines. — Pablo, Apr 18 '18 at 22:59
True, but the first line of the OP's problem statement is: *Im working with a sparse matrix and im given a text file like this:*. So I am assuming the OP indeed has a sparse matrix. — MFisherKDX, Apr 18 '18 at 23:00
Thanks! In this case, i call the file from the terminal (like: /.project ) so i think i have to read and make the matrix in the main function — MuchoG, Apr 18 '18 at 23:01
@MuchoG from my code, you have to change `main` to `int main(int argc, char **argv)` and then call `readmatrix(argv[1], &rows, &cols);`. Of course don't forget to check that `argv[1]` is not `NULL` — Pablo, Apr 18 '18 at 23:04
Although this obviously close to what OP wanted, it does not make a 2D array as per the title "Making a 2d array from a text file" but a pointer to pointers. — chux - Reinstate Monica, Apr 18 '18 at 23:04
@MFisherKDX in my assignment it just says that a sparse matrix is a matrix where a lot of elements are 0, but it doesn´t say it cant be dense. I think the goal is just learning to work with C, structs, and a little bit of pointers. The class is Introduction to Algorythms and Data Structures so its not very complex — MuchoG, Apr 18 '18 at 23:11
Yeah @chux my goal is to, with the requirements i said earlier, to create a 2d array from the file that i can work with — MuchoG, Apr 18 '18 at 23:17

score 2 · Answer 2 · answered Apr 18 '18 at 22:06

2

you need use in first :

float* sparseMatrix = malloc(sizeof(float) * 10000);

You start read the file and after the first line read you know the nomber of colums and the number of row is the number of line read. After you can reduce the matrix if you want.

free(sparseMatrix );
sparseMatrix = malloc(sizeof(float) * nbRow*nbColum);

answered Apr 18 '18 at 22:06

Alban Santolaria

33
3

That would destroy the sparse matrix and the read values would be gone. You have to use `realloc` to reduce the amount of allocated memory. – Pablo Apr 18 '18 at 22:47

MFisherKDX · Answer 3 · 2018-04-18T22:41:38.090

You simply don't have enough information to construct an appropriate matrix. In your cited case, you know you have AT LEAST 5 rows and AT LEAST 6 columns, but you don't know exactly how many rows m and columns n are in your matrix. So for your given input:

You could have a 5x6 matrix as:

5.2  0.0  0.0  1.2  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  3.2
2.1  0.0  0.0  0.0  0.0  4.2
0.0  0.0  0.0  0.0  0.0  2.2

Or you could have a 10x6 matrix as:

5.2  0.0  0.0  1.2  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  3.2
2.1  0.0  0.0  0.0  0.0  4.2
0.0  0.0  0.0  0.0  0.0  2.2
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0

This ambiguity is a problem as the first matrix is vastly different that the second.

Also, the point of a sparse matrix is to be efficient with memory and/or processing time. If you allocate a full array of m rows and n columns then you have a dense matrix representation instead.

In this case, since 2.2 is in the position [4][5] and there´s no element in a bigger row or column, the matrix should have 4 rows and 5 columns. If it was an element not 0 in the position [5] [5] then the matrix should have 5 rows and 5 columns — MuchoG, Apr 18 '18 at 22:40

score 0 · Answer 4 · edited Apr 22 '18 at 08:47

0

You can measure the number of rows and columns and then define your 2d array. Of course, this will increase the time complexity! If you don't care about the size of the memory, you can define your array with the max columns and rows! So you should choose between big time complexity and big memory size.

edited Apr 22 '18 at 08:47

marc_s

732,580
175
1,330
1,459

answered Apr 18 '18 at 21:41

Panos Antoniadis

108
7

score 0 · Answer 5 · answered Apr 18 '18 at 21:55

There's no real way of knowing what's in a file unless you completely read the file yourself first.

Since you know that there will be max 10k elements, you can statically allocate an array of that size first, and then load the numbers into the until you parse an EOF.

float sparseMatrix[10000];

It just means that your program will always allocate the space for 10k elements regardless of how many elements are actually in the array. If you assume that each element takes up 4 bytes, then you'll only be using ~40kB of memory. This is probably the easiest solution.

The other option would be to read the file fully, figure out the required size, and dynamically allocate that size, then read through the whole file again whilst populating the elements.

// Assume numberOfElements was determined by reading through the file first
float* sparseMatrix = malloc(sizeof(float) * numberOfElements);

Though this method will use less memory, it requires two full reads of the file + the overhead of calling malloc.

I don't think a sparse matrix should be represented this way. It should be represented in such a way to conserve memory and/or processing time. — MFisherKDX, Apr 18 '18 at 22:01

Making a 2d array from a text file

5 Answers5

Linked