0

Good evening,

I'm having trouble with an assignement.

Basically we're required to code a program which will calculate the prime factors of a given stdin. The data may only enter the program through its stdin, be it throuh an echo or a < file.txt. The stream of data will never be greater than 80 characters (they can be numbers or not).

The functions I use in the program are read(), strotol() and strtok(), and the "irrelevant" code flows as follows:

  1. Uses malloc to allocate 80 initial bytes of memory.
  2. Stores in an int, through read() the number of characters read (and I believe, the last \0).
  3. reallocates memory with realloc() so as to save as much memory (I know in this case it's trivial, but oh well...).

Now comes the tricky bit:

  1. Since the data has to be separated with spaces, the max amount of items to check are at most: (n/2)+1, where n is the number of characters read in the upper point nº 2.
  2. Creates an array of long with max size the number obtained in point nº 1.
  3. Fills numbers[0] with the result of: strtol(strtok(line, delim), &end, 10).
  4. Adds 1 to the counter and enters a while loop:

    while((numbers[counter] = strtol(strtok(NULL, delim), &end, 10)) != NULL) {
        if(!*end) {
            // Check whether it's 0, 1, negative, prime or extract its factors
        }
        else {
            fprintf(stderr, "\"%s\" is not a non-negative integer", end)
        }
        counter++;
    }
    

Now, here are some inputs and their outputs:

Input: echo 1 2 3 4 5 6 7 8 9 10 | ./factors

Output:

1
2    
3
    2    2
5
    2    3
7 
    2    2    2
    3    3
    2    5
Segmentation Fault (core dumped). 

Input ./factors < integers.txt Where integers contains a COLUMN of integers.

Output:

All the integers are factorised just fine and at the end it prints a:

Segmentation Fault (core dumped). 

Input: echo abc 12 13 14 15 | ./factors

Output:

"abc" is not a non-negative integer
13
    2    7
    3    5
Segmentation Fault (core dumped). 

Input: echo -1 -2 -3 -4 -5 | ./factors

Output:

"-1" is not a non-negative integer
"-2" is not a non-negative integer
"-3" is not a non-negative integer
"-4" is not a non-negative integer
"-5" is not a non-negative integer

Input: echo abc abc abc | ./factors

Output:

"abc" is not a non-negative integer

(And does not continue checking).

Input: echo 3 4 0 6 7 | ./factors

Output:

3
    2    2

(And does not continue checking).

So as far as I can see, it fails when it encounters a 0, more than one instance of a non-integer or basically at the end of a healthy integer-based stream of data.

Any idea how I could tackle this while and why is it failing so apparently randomly?

I should let you know I'm new to C...

Thank you very much in advanced.

====================================================

EDIT1: As per request, here are the code fragments which generate numbers[], and read from stdin:

char *line;
char *end;
char *delim = " \n";
int charsread, counter; 

line = malloc(80);
charsread = read(0, line, 81);
if (charsread == 0) {
    return EX_OK;
}
realloc(line, charsread);
maxelem = (charsread / 2) + 1;
long numbers[maxelem];
numbers[0] = strtol(strtok(line, delim), &end, 10);
if (!*end) {
    zeroone(numbers[0]);
}
else {
    fprintf(stderr, "\"%s\" is not a non-negative integer\n", end);
}
counter = 1;
while [...]
Mat
  • 202,337
  • 40
  • 393
  • 406
JSG_ESP
  • 45
  • 1
  • 7
  • when `Input: echo abc abc abc | ./factors`, strtol an error occurs (and convert result == 0) return 0, `while(... != NULL)` break while loop. – BLUEPIXY Jun 07 '12 at 22:44
  • isn't a line missing as well in your second example between `"abc" is not a non-negative number` and `13`? Is that an actual error, or just from your re-transcription? – haylem Jun 07 '12 at 23:42
  • Also, what does `zeroone` do??? – haylem Jun 07 '12 at 23:59

3 Answers3

2

try using gdb to debug the segfault, make it dump a core when segfaulting by setting an appropriate environment variable, or run it in gdb directly. Segfault means that you are reading/writing a part of memory you shouldn't. Randomness means that you are probably smashing the stack or something. I think "printf"'s are to blame, check their arguments. You are also not checking that numbers is smaller than array length? it might overrun it?

Markus Mikkolainen
  • 3,397
  • 18
  • 21
  • also i suggest you partition these things into nice little functions and write unit tests to see if they do what you think they do. – Markus Mikkolainen Jun 07 '12 at 18:08
  • unit tests might be a bit of an unintuitive concept for the OP, as it's a homework thing. Not necessarily, but quite likely. – haylem Jun 07 '12 at 18:13
  • surely, but it might not be very clear what you meant, and doing a search to learn more about unit testing is generally quite frightening for newcomers as you end up with a herd of complicated frameworks. But for the general idea, I'm all with you. – haylem Jun 07 '12 at 18:17
  • First of all thank you for the answers. @MarkusMikkolainen if by unit testing you mean split up into functions, I have done so with the area that checks for 1/0, negativeness, primeness and the factor extraction. Do you consider it would be useful to also split up the sentences inside the while loop? – JSG_ESP Jun 07 '12 at 18:48
  • @MarkusMikkolainen & @haylem : By the way, I'm not entirely sure about the `if (!*end)` bit since I took it from another question on this site. Do you think that might be the case? How could printfs be blameful for the unintendedness of the code's behaviour? What do you mean checking if numbers is smaller than the array length? Seeing the output I can tell that it's picking (when it does pick them) every number in the stream. – JSG_ESP Jun 07 '12 at 18:53
  • that complex thing inside the while loop comparison expression could be tested. also try executing it with gdb to see what happens, you might actually see where it blows. (ie. learn to use gdb) – Markus Mikkolainen Jun 07 '12 at 18:54
  • you are incrementing "counter" but not checking it against numbers-array length. if you accidentally access memory beyond numbers it will segfault. – Markus Mikkolainen Jun 07 '12 at 18:54
  • @JSG_ESP: "Seeing the output I can tell that it's picking (when it does pick them) every number in the stream." yes, but we can also see when it tries to pick stuff it shouldn't. Show us more of the code, namely the bits creating `numbers` and reading from `stdin`. – haylem Jun 07 '12 at 18:57
  • @JSG_ESP: actually, I kind of bet that the segault comes from the `while ()` line, at the point where it will try to store something in `numbers[]` past its storage capacity. But we can't be sure without seeing more. – haylem Jun 07 '12 at 18:58
  • @haylem I saw you deleted the other answer: Here is what I wrote back: As for the first review: I know it's ridiculous to reallocate memory when handling 80 bytes max. I just wanted to do things "properly" :) About the second one: I can input numbers as big as I wish and it'll still pick them up nicely (as long as they're numbers !=0). If I were to input `echo 1234 5678 91011 12131415 | ./factors` it'd produce the same output: Successfully extract the factors of those 4 numbers, and print an ending `Segmentation Fault...`. – JSG_ESP Jun 07 '12 at 19:23
  • @JSG_ESP: yes, I had started to write something, then realized I was wrong and wouldn't have the time to fix it as I had to go. Seemed better to delete it to not confuse anyone. – haylem Jun 07 '12 at 23:34
  • @JSG_ESP: re-published with changes. Sorry about that. – haylem Jun 07 '12 at 23:50
  • @MarkusMikkolainen I have given a quick read to gdb. Seems quite a powerful tool to check for errors in a program! However, I'm having trouble using it while piping input to the program's `stdin`. Am I missing something? tried with `run < inputfile.txt`, tried with the `|`, but nothing seemed to work. Bummer, really, since my program ONLY accepts input from `stdin`. Thank you for pointing me into the `gdb`! – JSG_ESP Jun 08 '12 at 06:36
  • umm. i cannot give you the answer straight for that , but it must be possible to start a program with gdb so that it will accept from stdin. try searching the net for a manual and read through man gdb. – Markus Mikkolainen Jun 08 '12 at 07:15
  • http://stackoverflow.com/questions/455544/how-to-load-program-reading-stdin-and-taking-parameters-in-gdb – Markus Mikkolainen Jun 08 '12 at 07:18
  • @MarkusMikkolainen I tried the solution posted on that question at SO, but didn't work (or didn't write it correctly). Might be because I'm using Cygwin... I'll try that out today on a Fedora PC. Thank you very much :) – JSG_ESP Jun 08 '12 at 07:28
1

Ok, let's review a few things while we're there, and see if we can fix your problem(s).


This is not necessary in your program:

realloc(line, charsread);

It is correct to attempt to reduce your memory footprint, but if you allocated too much, so what? It's 80 bytes. Don't overcomplicate it you already have enough on your plate.


This is odd:

maxelem = (charsread / 2) + 1;
long numbers[maxelem];

This is fine and will work in your case, but you could determine the number of elements more accurately by counting the groups of numbers.


I'd recommend trying to do the whole extraction both in the lopp


Regarding the actual segmentation fault... I might be wrong as I have not the opportunity to reproduce this on my machine here, but I think the error occurs because you didn't terminate your line with a \0 character. In C, as I'm sure you already know by the look of the snippets you posted, lines are usually what we call "NULL-terminated", so by convention we terminate them with the value 0, which is the numerical value for of the NUL character, also represented as \0 (which is sometimes a bit confusing, as then people get confused between "NULL-terminated", the "NUL" char, and "NULL pointers", which are a different thing).

This then makes your program break, because eventually your program tries to read the end of the line with strtok but it doesn't know where to stop. It has no conscience of the buffer length and of where to stop in this buffer. So it keeps reading, reaching a memory address it isn't allowed to access, and thus comes up a segmentation fault.

So you'd want to simply:

/*
** If you keep your realloc, you need to allocate for the number of read
** characters from stdin, and for an extra char to terminate.
*/
line = realloc(line, charsread + 1); 
/*
** Terminate the string.
*/
line[charsread] = '\0';

Update: Ah, you were actually almost there, you had the logic but probably just missed this bit... You even wrote this yourself:

Stores in an int, through read() the number of characters read (and I believe, the last \0).

Which is partly true. If you had really gotten a line of 80 chars, your read call would have returned the line with a \0 at the end. But most of the time you have less, so your read buffer only read visible characters, and you need to null-terminate the string yourself.


I'd also try to rewrite your processing loop to also do the first initial call to strtok as part of it. Not always handy to write, but usually seeing a block of code nearly identical to what's inside the loop just before or just after a loop makes me think there's a better logical approach.

haylem
  • 22,460
  • 3
  • 67
  • 96
0

Thank you very much for your insightful response.

In the end I just rewrote the whole thing up since it didn't look as clean as I expected. Since the manual from strtok specifies a return value of NULL if no token could be extracted, this is what I ended up with:

long number;
item = strtok(line, delim);
while (item != NULL) {
       number = strtol(item, &rest, 10);
       if (*rest == 0) {
           zeroOne(number);
       }
       else {
           fprintf(stderr, "\"%s\": not a non-negative int.\n", item);
       }
       item = strtok(NULL, delim);
}   

Seems a cleaner approach and does consider the fact that the returned value from the strtok when it is NULL is returned BEFORE it tries to enter the while loop I had before. Then, this morning I came back here and read your response :)

A follow-up question regarding your hard-coded input of a \0 to the line read: Does this mean that my last strtok actually outputs the \0 token and tries to enter the loop, or does it return a NULL when it reaches the entity right after my last introduced character? As a convention, when using read() (or maybe other reading functions such as fgets() should I always try to hard-code a \0 to the line read to allow other functions to check for EOL / EOF?

If someone else should be having trouble using any of these functions (strtok and strtol) I recommend to check out these two questions posted on this very site:

Regarding strtol's second argument: Strtol second argument

Regarding strtok's output: Cannot concatenate strtok's output variable. strcat and strtok

Community
  • 1
  • 1
JSG_ESP
  • 45
  • 1
  • 7
  • Sorry for the delay, but we don't get notifications for new answers (you do, because you posted the question). I only noticed because you unaccepted my answer :) Regarding your questions, `strtok` will return `NULL` when there's no more token to be found and the end of the string as been reached. So strtok won't return the `\0`. It is indeed conventionnal when using byte reading functions to add the trailing `\0`, but it depends on the function, so don't consider it a habit that all functions will do this for you (for instance, `strcpy` will, while `strncpy` may not). Glad you worked it out. – haylem Jun 11 '12 at 00:48