-1

When I do not open the file(180mb) I get no error, but when I open the file I get a Segmentation fault 11. I am fairly certain that I am exceeding memory for the process. Anyone have any tips?

I have changed parameters. So if NUM_TRAIN = 10 instead of 60000, I do not get a segmentation fault. When NUM_TRAIN = 60000, but I do not open the file, I also do not get a segmentation fault. When NUM_TRAIN = 60000 and I open the file, I get a segmentation fault.

#include <stdio.h>
#define NUM_TRAIN 60000
#define IMAGE_SIZE 784
#define TRAIN_IMAGES "mnist_train.csv"
int main()
{
  int arr[NUM_TRAIN][IMAGE_SIZE];
  int arr2[NUM_TRAIN];
  FILE* fTrain = fopen(TRAIN_IMAGES, "r");
  return 0;
}

I do not expect to get a segmentation fault but I am, specifically a Segmentation fault 11

  • 3
    Allocate your arrays on the heap instead of the stack, by using malloc. Or, define the array outside of main(), which will put it into the global memory area. – Robert Harvey Jun 21 '19 at 17:06

2 Answers2

1

You are creating these arrays on the stack. Unlike the heap, this has a strictly limited capacity and cannot be expanded. Limits between 1-8MB are common.

Instead allocate on the heap:

int **arr = malloc(sizeof(int *) * NUM_TRAIN);

for (int i=0; i < NUM_TRAIN; ++i)
{
   arr[i] = malloc(sizeof(int) * IMAGE_SIZE);
}

A two-dimensional array of int is in fact an array of pointers to arrays of int.

You ought to check for errors during the caller to malloc() - omitted here.

marko
  • 9,029
  • 4
  • 30
  • 46
  • 1
    Do not use `int **arr` for simple two-dimensional arrays. It wastes space and time. Use `int (*arr)[IMAGE_SIZE];` and `arr = malloc(NUM_TRAIN * sizeof *arr);`. – Eric Postpischil Jun 21 '19 at 17:09
  • 1
    A two-dimensional array of `int` is not in fact an array of pointers to arrays of `int`. – Eric Postpischil Jun 21 '19 at 17:10
  • It's been ages since I did C... why isn't he getting the seg fault when he doesn't open the file? (Maybe the compiler optimizes the allocation away because 'not opening the file' means he commented the line out?) – Eric J. Jun 21 '19 at 17:11
  • 1
    @EricJ.: Most likely OP is not showing the full code and, when the file is not opened, OP is also omitting the code that would use the arrays, or the compiler is able to deduce they are not used, and therefore the compiler optimizes away the creation of the arrays. Then stack is not used, so there is no segmentation fault. – Eric Postpischil Jun 21 '19 at 17:12
  • Allocating the array causes the stack pointer to exceeds the stack limit; the next thing that is pushed there (for instance, a function call) blows it up. Could equally be the next variable that is spilled. – marko Jun 21 '19 at 17:14
  • 2
    @marko: The question Eric J. asked is not why the stack overflows but rather why there is a difference between whether it does or does not overflow as a consequence of whether the file is opened or not. – Eric Postpischil Jun 21 '19 at 17:17
  • @EricPostpischil No, I posted the entirety of the code. This was in a test file and the exact behavior I discussed is what took place. – John Mancini Jun 21 '19 at 17:18
  • 1
    @JohnMancini: Then a proper question and answer requires posting the complete version information of the compiler and other tools and the exact commands used to compile and run, possibly along with enough information to reproduce the input file sufficiently to reproduce the crash. As stated, no good compiler would exhibit the difference in behavior reported; it would either keep the arrays (if optimization were disabled) and crash regardless of whether the file were opened or not or remove the arrays (if optimization were enabled) and not crash regardless. – Eric Postpischil Jun 21 '19 at 17:19
  • @EricPostpischil none of the declarations of the arrays write to the stack (don't pay for initialisation when it's not needed)- but calling `fopen()` pushes a stack frame to it. Presumably by 'not opening the file', the OP is commenting out this line. – marko Jun 21 '19 at 17:22
  • @JohnMancini: One other possibility is the crash is actually caused by `fTrain` and/or the stack frame to call `fopen` being located below the arrays and used if the file is opened but not if the file is not opened, and hence there would be a difference. But that is still a sign of a compiler shortcoming, as it requires space for the arrays to be allocated even though they are not used but also for that space not to be touched so as to avoid causing a crash. – Eric Postpischil Jun 21 '19 at 17:23
  • The push of a stack frame on function call would happen first in that case. No guarantees that `fTrain` is ever written to the stack at all - could easy remain in a register. The stack frame will always be below the current frame. – marko Jun 21 '19 at 17:25
1

I fixed my issue. Basically I was allocating way too much static memory with the arrays, so I changed the code to dynamically allocate the memory to the heap. Code is as follows:

#include <stdio.h>
#include <stdlib.h>
#define NUM_TRAIN 60000
#define IMAGE_SIZE 784
#define TRAIN_IMAGES "mnist_train.csv"
int main()
{
  int** arr = (int**)malloc(sizeof(int*)*NUM_TRAIN);
  for (int i = 0; i < NUM_TRAIN; i++)
    arr[i] = (int*)malloc(sizeof(int)*IMAGE_SIZE);
  int* arr2 = (int*)malloc(sizeof(int)*NUM_TRAIN);
  FILE* fTrain = fopen(TRAIN_IMAGES, "r");
  return 0;
}
  • 1
    Do not use `int **arr` for simple two-dimensional arrays. It wastes space and time. Use `int (*arr)[IMAGE_SIZE];` and `arr = malloc(NUM_TRAIN * sizeof *arr);`. – Eric Postpischil Jun 21 '19 at 17:11
  • 2
    Re “I was allocating way too much static memory”: Inside a function, int arr[NUM_TRAIN][IMAGE_SIZE]; does not allocate static memory; it allocates automatic memory, which is most often implemented with a stack. – Eric Postpischil Jun 21 '19 at 17:25
  • @EricPostpischil Can you explain why that is more space and time efficient? – John Mancini Jun 21 '19 at 17:27
  • 1
    @JohnMancini, because Eric's version requires only one `malloc()` call (and later, only one `free()`), as opposed to `NUM_TRAIN + 1` of them, and it reserves space only for the needed `int`s, not for an additional `NUM_TRAIN` pointers. Also, one less indirection is required for access, and all the `int`s are certain to be in a single, contiguous block, which may contribute to better cache locality. All in all, Eric's version is *much* better. – John Bollinger Jun 21 '19 at 17:30
  • 1
    Not to mention that the two approaches produce objects that, although they can be accessed with the same indexing syntax, are in fact of importantly different type. The difference becomes apparent when you pass them as function arguments, among other places. – John Bollinger Jun 21 '19 at 17:35