1

I am new with C programming and I was assigned to create command-line methods to use using the Terminal in UNIX. So far, I've implemented ways to read a file and read it in an reversed order. The last part requires specifying a line number and displaying it (ex: ./showfile -l(2) textfile).

My concern is to identify the input and I decided using Regex for this task but I can't seem to find a way to identify the pattern -l(i) at argv[1].

EDIT: Everything I did:

void reversetext(FILE * f);
void main(int argc, char * argv[]) {
    regex_t regex;
    int i;
    FILE * f;
    char c;

    if (regcomp( & regex, "-l[[digit:]]+", 0)) {
        printf("Could not compile\n");
        return;
    }

    if (strcmp(argv[1], "-r") == 0) {
        f = fopen(argv[2], "r");
        reversetext(f);
    } else if (regexec( & regex, argv[1], 0, NULL, 0) == 0) {
        printf("%s", argv[1]);
    } else {
        f = fopen(argv[1], "r");
        c = getc(f);

        while (c != EOF) {
            printf("%c", c);
            c = getc(f);
        }
    }

    fclose(f);
}

void reversetext(FILE * f) {
    char c = getc(f);
    if (c == EOF) {
        return;
    }

    reversetext(f);
    printf("%c", c);
}

Why am I getting Segmentation Fault error? I read some previous posts but none of the users faced this error with POSIX.

NOTE: I have included the lib I need above main.

Code Explanation: ./showfile -r text.txt >> views in reverse

Second if-statement to specify line

else: print normally.

Broxzier
  • 2,909
  • 17
  • 36
Atieh
  • 230
  • 2
  • 16
  • 1
    Post a short, compileable example with a `main()` function and an initialized buffer instead of `argv[1]`, so we can try it out. – Crowman Oct 19 '13 at 21:18
  • @PaulGriffiths why should I use a buffer instead? my code requires argv[1] and I don't know where to use my buffer in this case. – Atieh Oct 19 '13 at 21:32
  • For the purposes of asking your question, because it eliminates any possibilities of you not calling your program correctly, and it helps us see exactly what you're passing to `regexec()`. That being said, see my answer for the solution to your problem. – Crowman Oct 19 '13 at 21:45

2 Answers2

2

This'll work for you:

#define _POSIX_C_SOURCE 200809L

#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

int main(void) {
    const char tests[2][4] = {"-l4", "-lm"};
    const char match[] = "-l[[:digit:]]+";
    regex_t rmatch;

    if ( regcomp(&rmatch, match, REG_EXTENDED) != 0 ) {
        perror("Error compiling regex");
        return EXIT_FAILURE;
    }

    for ( int i = 0; i < 2; ++i ) {
        if ( regexec(&rmatch, tests[i], 0, NULL, 0) != 0 ) {
            printf("No match for '%s'.\n", tests[i]);
        } else {
            printf("Matched '%s'.\n", tests[i]);
        }
    }

    return 0;
}

Output:

paul@local:~/src/c/scratch$ ./regex
Matched '-l4'.
No match for '-lm'.
paul@local:~/src/c/scratch$

EDIT: In the code you posted, you've got a couple of problems:

  1. This line:

    if(regcomp(&regex,"-l[[digit:]]+",0)){
    

    should be:

    if( regcomp(&regex, "-l[[:digit:]]+", REG_EXTENDED) ) {
    

    since you're using extended regular expressions. If you change this line, your pattern will successfully match.

  2. Your segmentation fault is actually nothing to do with your regular expressions, and comes from calling this:

    fclose(f);
    

    when on an execution path where you never successfully opened a file. You should change:

    FILE *f;
    

    to:

    FILE *f = NULL;
    

    and change:

    fclose(f);
    

    to:

    if ( f ) {
        fclose(f);
    }
    

    Making yourself familiar with gdb will go a long, long way towards getting you able to track these things down yourself.

Here's a modified version of your own code that'll work and includes some basic error-checking:

#define _POSIX_C_SOURCE 200809L

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>

void reversetext(FILE * f);

int main(int argc, char *argv[]) {
    regex_t rmatch;
    FILE *f = NULL;
    int c;

    if ( argc < 2 ) {
        printf("You need to enter at least one command line argument.\n");
        return EXIT_FAILURE;
    }

    if ( regcomp(&rmatch, "-l[[:digit:]]+", REG_EXTENDED) ) {
        printf("Could not compile regex.\n");
        return EXIT_FAILURE;
    }

    if ( strcmp(argv[1], "-r") == 0 && argc > 2 ) {
        printf("argv[1] is -r\n");
        if ( (f = fopen(argv[2], "r")) == NULL ) {
            fprintf(stderr, "Couldn't open file %s\n", argv[2]);
            return EXIT_FAILURE;
        }
        reversetext(f);
    } else if (regexec(&rmatch, argv[1], 0, NULL, 0) == 0) {
        printf("Matched '%s' to regex\n", argv[1]);
    } else {
        if ( (f = fopen(argv[1], "r")) == NULL ) {
            fprintf(stderr, "Couldn't open file %s\n", argv[1]);
            return EXIT_FAILURE;
        }

        while ( (c = getc(f)) != EOF) {
            printf("%c", c);
        }
    }

    if ( f ) {
        fclose(f);
    }    
}

void reversetext(FILE * f) {
    int c = getc(f);
    if (c == EOF) {
        return;
    }

    reversetext(f);
    printf("%c", c);
}

Output:

paul@local:~/src/c/scratch$ ./regex2 -l4
Matched '-l4' to regex
paul@local:~/src/c/scratch$ ./regex2 -r fakefile
argv[1] is -r
Couldn't open file fakefile
paul@local:~/src/c/scratch$ ./regex2 -tribbles
Couldn't open file -tribbles
paul@local:~/src/c/scratch$ ./regex2 testfile
This is a test.
paul@local:~/src/c/scratch$ ./regex2 -r testfile
argv[1] is -r

.tset a si sihTpaul@local:~/src/c/scratch$

Note than when you're using getc() and friends, they use ints, not chars. This is necessary in order to be able to store EOF.

EDIT 2: Per the question in your comment, you need to do four things to match a sub-group, in this case, the numeric part of the match.

  1. Set up an array of type regmatch_t. You'll need at least two elements, since the first will match the entire regex, and you'll need at least a second for one sub-group. In the code below, I've added:

    #define MAX_MATCHES 10
    regmatch_t m_group[MAX_MATCHES];
    
  2. Put parentheses around the part of the regex you want to extract. In the code below, I've changed:

    "-l[[:digit:]]+"
    

    to:

    "-l([[:digit:]]+)"
    
  3. Pass your regmatch_t array to regexec() when you call it, along with the size. In the code below, I've changed:

    } else if (regexec(&rmatch, argv[1], 0, NULL, 0) == 0) {
    

    to:

    } else if (regexec(&rmatch, argv[1], MAX_MATCHES, m_group, 0) == 0) {
    
  4. Cycle through the array and deal with each match. Everytime the rm_so member of a regmatch_t array element is not -1, then you have a match. All I'm doing here is copying them to a buffer and printing them out:

    } else if ( regexec(&rmatch, argv[1], MAX_MATCHES, m_group, 0) == 0 ) {
        printf("Matched '%s' to regex\n", argv[1]);
        for ( int i = 0; i < MAX_MATCHES && m_group[i].rm_so != -1; ++i ) {
            char buffer[1000] = {0};
            char * match_start = &argv[1][m_group[i].rm_so];
            size_t match_size = m_group[i].rm_eo - m_group[i].rm_so;
            size_t match_len = match_size > 999 ? 999 : match_size;
            strncpy(buffer, match_start, match_len);
            printf("Matched group %d was '%s'\n", i, buffer);
        }
    } 
    

Here's updated code:

#define _POSIX_C_SOURCE 200809L

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>

#define MAX_MATCHES 10

void reversetext(FILE * f);

int main(int argc, char *argv[]) {
    regex_t rmatch;
    regmatch_t m_group[MAX_MATCHES];
    FILE *f = NULL;
    int c;

    if ( argc < 2 ) {
        printf("You need to enter at least one command line argument.\n");
        return EXIT_FAILURE;
    }

    if ( regcomp(&rmatch, "-l([[:digit:]])+", REG_EXTENDED) ) {
        printf("Could not compile regex.\n");
        return EXIT_FAILURE;
    }

    if ( strcmp(argv[1], "-r") == 0 && argc > 2) {
        printf("argv[1] is -r\n");
        if ( (f = fopen(argv[2], "r")) == NULL ) {
            fprintf(stderr, "Couldn't open file %s\n", argv[2]);
            return EXIT_FAILURE;
        }
        reversetext(f);
    } else if ( regexec(&rmatch, argv[1], MAX_MATCHES, m_group, 0) == 0 ) {
        printf("Matched '%s' to regex\n", argv[1]);
        for ( int i = 0; i < MAX_MATCHES && m_group[i].rm_so && ; ++i ) {
            char buffer[1000] = {0};
            char * match_start = &argv[1][m_group[i].rm_so];
            size_t match_size = m_group[i].rm_eo - m_group[i].rm_so;
            size_t match_len = match_size > 999 ? 999 : match_size;
            strncpy(buffer, match_start, match_len);
            printf("Matched group %d was '%s'\n", i, buffer);
        }
    }  else {
        if ( (f = fopen(argv[1], "r")) == NULL ) {
            fprintf(stderr, "Couldn't open file %s\n", argv[1]);
            return EXIT_FAILURE;
        }

        while ( (c = getc(f)) != EOF) {
            printf("%c", c);
        }
    }

    if ( f ) {
        fclose(f);
    }
}

void reversetext(FILE * f) {
    int c = getc(f);
    if (c == EOF) {
        return;
    }

    reversetext(f);
    printf("%c", c);
}

Outputs:

paul@local:~/src/c/scratch$ ./regex2 -l4
Matched '-l4' to regex
Matched group 0 was '-l4'
Matched group 1 was '4'
paul@local:~/src/c/scratch$
Crowman
  • 25,242
  • 5
  • 48
  • 56
  • if you notice my posted code, it's the same I'm doing, but instead of tests[i], I am placing argv[1] which should be -l(i) but nothing is printing out..instead an error – Atieh Oct 19 '13 at 21:45
  • @Atieh: It's not the same - you're not passing `REG_EXTENDED` to `regcomp()` in the code you posted. – Crowman Oct 19 '13 at 21:46
  • it worked! thank you so much! been researching this for the past 5 hours. Allow me to go on with acing this assignment :) – Atieh Oct 19 '13 at 21:52
  • @Atieh: No problem. I posted a working version of your code in my edit, along with some proper error checking. – Crowman Oct 19 '13 at 21:52
  • btw, do you know how can I extract the 2 in -l2 for example? the 2 will specify the line number. – Atieh Oct 19 '13 at 22:55
  • @Atieh: Yes, see my latest edit. You can use `strtol()` to turn it into an actual number, after how I've extracted the string. – Crowman Oct 19 '13 at 23:19
  • ok all is good. I saw your edit and noticed the () around the :digit:, I forgot about extracting a sub-pattern from a match and store it in a register. I have an idea of how the pmatch[] or m_group[] (in your case) works! They taught us of using {1}{2} or \1,\2...\n to store in registers. This is great! – Atieh Oct 20 '13 at 08:06
  • so basically, m_group[1] will always be my number. – Atieh Oct 20 '13 at 08:17
  • When you have one single group, yes, it will be. – Crowman Oct 20 '13 at 13:18
0

If you need help in patterns this might help: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

But about the segmentation fault more detail is required.

This is also a working example you only need to replace the pattern: Regular expressions in C: examples?

Community
  • 1
  • 1
Novin Shahroudi
  • 620
  • 8
  • 18