How to deal with SEGFAULT-s as a beginner? (C)

Question

I have a problem on SPOJ and a deadline soon.

I'm tasked to write a program that counts how many identifiers are in a given line. Identifier is defined a sequence of characters from set 'a'-'z' or 'A'-'Z' or '0'-'9' or '_', starting from any letter or underline character ('_').

Input

There are given some number of data sets. Each data set is a line consisting from the sequence of some number of words, separated by spaces and finishing with the end of line character (even the last one line). A word is a sequence of any ASCII character of code from 33 till 126 (see http://www.asciitable.com for more details), e.g., aqui28$-3q or _dat_ The second word is an identifier, but the first one is not.

Output

The number of identifiers in each line.

Example

Input:

Dato25 has 2 c-ats and 3 _dogs
op8ax _yu _yu67 great-job ax~no identifier.

Output:

4
3

The code I wrote compiles, but when submitting it returns SIGSEGV (Segmentation Fault).

`((c = getchar()) != EOF)` you define `char c;` but `getchar` returns an `int`. With `char` you might not be able to distinguish `EOF` from valid characters. — Gerhardh, Jan 19 '23 at 09:12
The size of `struct WORDS line[100];` is `100 * ( 4 + 50 * 300 )`. Quite huge for a stack variable. — mch, Jan 19 '23 at 09:19
You might simplify your conditions by using `isalpha`, `isdigit` and similar functions — Gerhardh, Jan 19 '23 at 09:20
@Haris I changed it to int but still the same problem. I think the issue is with my 2 dimensional array, maybe too much extra space being allocated? Sorry if my comment didn't make sense, I've heard about memory allocation but have 0 idea how to use it. — andromedstrain, Jan 19 '23 at 09:21
See if your case matches: https://stackoverflow.com/q/33047452/20017547 — Harith, Jan 19 '23 at 09:21
@mch should I start reading about memory allocation or there's something else? — andromedstrain, Jan 19 '23 at 09:22
One simple answer is to abandon ideas of attempting to use SPOJ or CHEF etc as learning resources. At best you'll pick up bad habits. They are amusements for people who already code. One exception is perhaps Project Euler, where you should already know how to code, but are learning math algorithms etc. — Weather Vane, Jan 19 '23 at 09:23
Does this answer your question? [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) — Karl Knechtel, Jan 19 '23 at 09:23
Does this answer your question? [Getting a stack overflow exception when declaring a large array](https://stackoverflow.com/questions/571945/getting-a-stack-overflow-exception-when-declaring-a-large-array) — Harith, Jan 19 '23 at 09:24
You do not protect your arrays against too long input lines. What will happen if you read more than 300 characters before hitting a `' '` or `'\n'` or `EOF`? — Gerhardh, Jan 19 '23 at 09:25
"How do I do X as a beginner?" isn't an answerable question here. First off, this is **not a discussion forum**, so we don't offer guides or general tutorials - we need a **specific, concrete question about a specific, concrete problem** to answer. If there is an error in code, this means first [debugging](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) to find a problem, and then creating a [mre]. Second: being a beginner **does not change** how the language works. — Karl Knechtel, Jan 19 '23 at 09:25
`struct WORDS line[100];` this array uses approximately 1.5 Mbytes. This might be too much for a local variable. Try `static struct WORDS line[100];`. — Jabberwocky, Jan 19 '23 at 09:25
@WeatherVane I would follow your advice, the problem is that a professor in my college has his whole course set up around SPOJ, so not much I can do. — andromedstrain, Jan 19 '23 at 09:28
@andromedstrain how do you test your code? Do you just submit it to SPOJ or do you have a local development environnement? — Jabberwocky, Jan 19 '23 at 09:29
@Jabberwocky I think of some test cases, I compile the code using gcc and submit the different test cases. If those work, then I submit to SPOJ. — andromedstrain, Jan 19 '23 at 09:32
@andromedstrain your code works fine on my machine provided I replaced `struct WORDS line[100];` with `static struct WORDS line[100];`. Did you try that? This issue has been mentioned in three different comments. — Jabberwocky, Jan 19 '23 at 09:33
@Jabberwocky worked fine on my machine but SPOJ still gives seg fault. Thanks for the attempt. — andromedstrain, Jan 19 '23 at 09:35
I make submissions to SPOJ. I create my own set of test cases that stretch the given limits of the problem, and test corner cases. I first make a crude solution that will solve the given test cases, but not the actual test due to the time constraint. I use the simple code to generate answers to my own tests, and as I make the code more sophisticated I can use increasingly complex test cases, building on what I have done. — Weather Vane, Jan 19 '23 at 09:35
The member `ident_count` was not initialised before incrementing it. You haven't shown the actual problem link, but why is it necessary to use such a huge array? If each test case is one line, you don't need to store any previous lines. After you increment the line index `i` you never go back to previous lines. — Weather Vane, Jan 19 '23 at 09:47

John Bollinger · Answer 1 · 2023-01-19T16:23:32.607

Your code exhibits a lamentably common anti-pattern: unnecessarily reading a large chunk of data into memory before processing it, when it could instead be processed as you go.

Consider: when you're processing one input word, do you need to refer to previous or subsequent words of the same or any other line? No. So why are you keeping all that around in memory?

In fact, you don't need to store any part of any of the words, except a single character you have just read. You simply need to

track
- how many identifiers you've seen so far on the current line and
- what kind of thing you're parsing at any given time (possible identifier, non-identifier word, or spaces),
update that appropriately for each character read, and
emit appropriate output (based on the preceding) at the end of each line.

That is likely to be faster than your approach. It will certainly use less memory, and especially less stack memory. And it affords little to no room for any kind of bounds overrun or invalid pointer use, such as are the usual reasons for a memory error such as a segfault.

As to why your original program segfaults, I ran it using valgrind, which is a popular tool for identifying memory usage problems. It can detect memory leaks, some out-of-bounds accesses, and use of uninitialized memory, among other things. It showed me that you never initialize the ident_count of any line[i]. Non-static local variables such as your line are not automatically initialized to anything in particular. Sometimes you can luck out with that, and it's not the cause of your particular issue, but cultivate good programming practices: fix it.

Valgrind did not indicate any other errors to me, however, nor did your program segfault for me with the example input. Nevertheless, I anticipate that I could wreak all kinds of havoc in your program by feeding it input with more than 100 lines and / or more than 300 words in a line, and / or more than 50 characters in a word. Automated judges tend to include test cases that explore the extremes of the problem space, so you need to be sure that your program works for all valid inputs.

Alternatively, a valid point is made in comments that you are allocating a large object on the stack, and stack space may not be sufficient for it in the judge's test environment. If that's the issue, then a quick and easy way to resolve it in your current code would be to allocate only one struct WORDS and reuse it for every line. That will reduce your stack usage by about a factor of 100, and again, what purpose is served by storing all the lines in memory at the same time anyway?

Thanks a lot for your answer. Got lot to learn. – andromedstrain Jan 20 '23 at 17:14 — andromedstrain, Jan 20 '23 at 17:14

How to deal with SEGFAULT-s as a beginner? (C)

1 Answers1