0

Is there any library method or universally recognized method to recognize a token with a quote inside of a double quote, while still recognizing single quotes without double quotes as tokens?

For example, the string: "Bill's Pot" 'Roast' should result in the tokens:

Bill's Pot
Roast
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Ali Khan
  • 37
  • 6
  • 1
    You can try with a regular expression although the sample string has no visible problem so I don't understand what your question is? Just capture every character found between two `"` characters? – Iharob Al Asimi Mar 12 '15 at 01:33
  • @iharob Sorry, the newline wasn't being shown correctly – Ali Khan Mar 12 '15 at 01:58
  • 1
    There isn't a standard function that interprets strings like that. There's also, always, the problem of "how do you get a single quote and a double quote into a single token"? Do you use doubled quotes (so `""` in the middle of a string surrounded by double quotes maps to a single instance of a double quote), or do you use another escape character (classically, the backslash, `\`). Etc. Such decisions can be implemented, but there isn't a C standard or POSIX standard function that specifically copes with such parsing. – Jonathan Leffler Mar 12 '15 at 02:09
  • 2
    Process the input one `char` at a time moving between 1 of 4 states: white-space, in a word, in a "" phrase and in a '' phrase. – chux - Reinstate Monica Mar 12 '15 at 02:23
  • If you're doing manual parsing and `'` strings should be able to contain `"` characters too, then you could just look for either quote mark, remember which one it was, and then scan for the same quote mark (with e.g. `strchr()`, assuming the string is null-terminated to make it safe for malformed input). Would be easy to do with common code for both `"` and `'` strings. Features like backslash-escaping and not wanting newlines in strings would need a bit more complex code. – Ulfalizer Mar 12 '15 at 03:14

1 Answers1

2

There isn't a library function to do this specifically, but there are library functions that can help you to do this yourself such as strchr to get a pointer to a character of your choice within a particular string, if it exists, and isspace to detect space characters for unquoted strings, though isspace is also dependent upon the locale. If you just want space characters as defined in the "C" locale to be removed, just use strspn with a second argument of " \f\n\r\t\v" instead of a loop that calls isspace repeatedly (note the space character at the start of that string).

Here's one way to parse your sample string with additional rules for allowing C-style backslash escapes to allow embedded quotation marks. Note that it only detects the beginnings and ends of strings delimited by spacing characters, meaning it won't actually replace the escaped quotation marks or do anything else:

char str[] = "\"Bill's Pot\" 'Roast'";
char *start;
char *end;

start = str;
while (*start) {
    // Skip leading spaces.
    while (isspace(*start))
        ++start;

    // Double-quoted string with backslash escapes.
    if (*start == '"') {
        end = strchr(++start, '"');
        while (end != NULL && *end == '"' && end[-1] == '\\')
            end = strchr(++end, '"');
        if (end == NULL || *end == '\0') {
            fprintf(stderr, "Unterminated double-quoted string -- %s\n", --start);
            break;
        }
    }

    // Single-quoted string with backslash escapes.
    else if (*start == '\'') {
        end = strchr(++start, '\'');
        while (end != NULL && *end == '\'' && end[-1] == '\\')
            end = strchr(++end, '\'');
        if (end == NULL || *end == '\0') {
            fprintf(stderr, "Unterminated single-quoted string -- %s\n", --start);
            break;
        }
    }

    // Unquoted (space-delimited) string.
    else if (*start != '\0') {
        end = start + 1;
        while (*end != '\0' && !isspace(*end))
            ++end;
    }

    // Empty string.
    else
        end = start;

    printf("%.*s\n", end - start, start);

    // Quotes must be skipped before continuing parsing.
    if (*end == '\'' || *end == '"')
      ++end;

    // Get ready to start the next round of parsing.
    start = end;
}

You could also avoid using the string library functions and simply do your own string parsing. This allows you to do things like handle strings of the form Bill"'s Pot" in a flexible manner. Should it be one string Bill's Pot or two strings Bill 's Pot? There exist alternative methods to escape quotation marks and even other ways to delimit strings in addition to single and double quotation marks as well as quoting rules à la POSIX sh that allow you to embed newlines inside a string, meaning the opening quote and the closing quote are on two different lines, which C forbids. In the latter case, the C string functions alone aren't enough since you need a state variable to indicate that you're inside a single-quoted or double-quoted string. That should give you an idea of what @JonathanLeffler meant in his comment; there are so many different quoting rules! Hopefully the code I've provided will give you some idea of how to go about doing what you want.

Community
  • 1
  • 1