-1

I was trying to figure out how the below function works, and for the most part it's coming along fine (so I guess learning about how C works along the way as well). However, I've run into an issue. On the line that has

*q = 0, chr = p, ...

If I do that in a separate, sample code, I get a segmentation fault. I know that the program that the below function is from works, and that it uses this function. I'm wondering what I'm missing. This answer seems to suggest that I'm trying to edit something I can't edit, what is the difference in the below function?

static void parse_chr(struct hk_sdict *d, char *s) // io.c 282, *d is hk_map -> d
{
    char *chr, *p, *q;
    int64_t len;
    int has_digit, prelen;
    prelen = strncmp(s, "#chromsize:", 11) == 0? 11 : 12;
    for (p = s + prelen; isspace(*p) && *p != 0; ++p) {}
    assert(*p);
    for (q = p; *q != 0 && !isspace(*q); ++q) {}
    assert(*q);
    *q = 0, chr = p, p = q + 1;
    // ... there is more to this function, but didn't seem relevant to the question, it manipulates the resulting p and chr from here via other functions in the program
}

A sample of text that the above function would be looking at is are lines in a text file such as:

#chromsize: chr1    249250621
#chromsize: chr10   135534747
...

Updates on s:

s is defined in a previous data structure as such:

typedef struct __kstring_t {
    size_t l, m;
    char *s;
} kstring_t

later on, a kstring_t is initialized like so:

kstring_t str = {0,0,0};

Then later on, given an size_t value m (which is stored in the same structure str) we have

str->s = (char*)realloc(str->s, str->m);

This is then updated later on with (given an size_t l, unsigned char *buf, int begin, int i):

memcpy(str->s + str->l, ks->buf + ks->begin, i - ks->begin);

This is the processing done to s, and then it is passed to the above function.

Here is my attempt (I'm not sure if all of the includes are necessary, just wanted to replicate packages which are available to the original script) (this gives me a segmentation fault (core dumped)):

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
#include <zlib.h>

int main ()
{
    char *s = "#chromsize: chr1 249250621";
    char *p, *q, *chr;
    for (p = s + 11; isspace(*p) && *p != 0; ++p) {}
    assert(*p);
    for (q = p; *p != 0 && !isspace(*p); ++p) {}
    assert(*q);
    *q = 0;
    chr = p, p = q + 1;
}
Chris
  • 153
  • 7
  • How is the function called? Is `s` a writable string? – Barmar Sep 21 '20 at 19:43
  • yes, `s` is writable (like editable, do you mean? If so, yes) – Chris Sep 21 '20 at 19:45
  • "editable" is not a C concept. It should be an array, not a string literal. – Barmar Sep 21 '20 at 19:51
  • Any reason why you're not using `strtok()`? – Barmar Sep 21 '20 at 19:52
  • `s` is initially declared as `char *s;`. Also I'm not sure exactly why they don't use that... it seems like it would be more straightforward than this method – Chris Sep 21 '20 at 19:54
  • 1
    `prelen = ... 12;` is a problem when `s` is short. Post a [mcve] – chux - Reinstate Monica Sep 21 '20 at 19:54
  • `s` will, in this code, never be that short. The `prelen = 11/12` is there to skip the part that says `#chromsize`. Here, if it doesn't say `#chromsize` at that point in `s`, then this function isn't called. Sorry, I suppose I should have included other info, but wasn't sure what important things to leave out/in – Chris Sep 21 '20 at 19:56
  • https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Barmar Sep 21 '20 at 19:59
  • If the caller does `char *s = "some string";` then `s` is not writable because it's a string literal. – Barmar Sep 21 '20 at 20:00
  • I filled in more detailed information about `s`, I hope that clarifies some; I had the wrong type for some stuff in the edits - just edited – Chris Sep 21 '20 at 20:05
  • I don't think I'm following what exactly it is that fails for you. What "separate, sample code"? Have you presented different code than what produces an error for you? And with what are you asking us to compare the function presented? Generally speaking, we want to see a [mre] in debugging questions such as this seems to be. In part, that helps us test things for ourselves, but in part, it addresses the issue that question posters are mistaken fairly often about the nature and location of the problems they ask about. – John Bollinger Sep 21 '20 at 20:14
  • Sorry, I should have included that originally. I put it at the end of the post – Chris Sep 21 '20 at 20:20
  • Does this answer your question? [Why do I get a segmentation fault when writing to a "char \*s" initialized with a string literal, but not "char s\[\]"?](https://stackoverflow.com/questions/164194/why-do-i-get-a-segmentation-fault-when-writing-to-a-char-s-initialized-with-a) – Random Davis Sep 21 '20 at 20:42
  • A C string literal refers to an array that is not modifiable (more precisely, attempting to modify it causes undefined behavior; a seg fault is typical). Logically it would have made more sense to make string literals `const`, but that would have broken existing code. (C++ does make string literals `const`.) – Keith Thompson Sep 21 '20 at 20:54
  • Every comment and answer(1) is mostly just para-phrasing from this style document. Why not just read it for yourself to get the complete picture with char's and C. https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/CharacterStrings.html –  Sep 25 '20 at 15:28

1 Answers1

1

Here is my attempt (I'm not sure if all of the includes are necessary,

For what it's worth, only ctype.h and assert.h are necessary for the example program.

just wanted to replicate packages which are available to the original script)

You sound like a Python programmer. #include directives do not make "packages" available to anything. Packages are not a C concept. Generally speaking, headers provide declarations of functions and occasionally variables that are assumed to be defined externally to the program, and often define one or more preprocessor macros. It is necessary for functions to be declared before they are called and for variables to be declared before they are accessed, but that is not sufficient. The function implementation and / or variable definition also has to be linked to the program. That's (usually) automatic for members of the C standard library, but not for other external functions.

(this gives me a segmentation fault (core dumped)):

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
#include <zlib.h>

int main ()
{

Here is the main issue:

    char *s = "#chromsize: chr1 249250621";

That declares s to be a pointer, initially pointing to the first character in a string literal. String literals must not be modified. Attempts to do so produce undefined behavior, of which a segfault is an eminently plausible manifestation.

Blockquote

    char *p, *q, *chr;

Note that here ...

    for (p = s + 11; isspace(*p) && *p != 0; ++p) {}

... p, also a pointer, is set initially to point to the character 11 past the one to which s points, which is also in the literal (and the pointer addition would produce UB, too, if it produced a result pointing more than one character past the end of the literal).

    assert(*p);

Similarly, here ...

    for (q = p; *p != 0 && !isspace(*p); ++p) {}

... q is set to p (pointing into the string literal).

    assert(*q);

Therefore, when you do this:

    *q = 0;

... you are trying to modify the literal. Oops. That has nothing to do with whether that assignment is performed in a separate statement expression, as above, or inside a larger expression such as

*q = 0, chr = p, ...

You can enlist the compiler's help to detect such issues by ensuring that you do not assign string literals to pointers of type char *. Instead, use type const char *. It is an unfortunate quirk of C's development history that C does not require that in the first place.

You can fix the example program by declaring s differently:

    char s[] = "#chromsize: chr1 249250621";

That gives you a separate, modifiable, array of char, whose length and initial contents match the specified string literal.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Thank you for your help. I realize that what here is a string array, although quite confused how it works in the original code. Although I'll take it as-is that it works... I'm able to run this at least now, and see how it is operating. – Chris Sep 21 '20 at 21:22
  • And yes indeed, I started in Python, it seems I carry some misconceptions as a result – Chris Sep 21 '20 at 21:26