compare two alphanumeric string

Question

I need to compare string into following way. Can anyone provide me some insight or algorithm in c++. For example:

 "a5" < "a11"        - because 5 is less than 11
 "6xxx < 007asdf"    - because 6 < 7
 "00042Q < 42s"      - because Q < s alphabetically
 "6   8" < "006 9"   - because 8 < 9

score 3 · Answer 1 · answered Apr 18 '12 at 05:29

I suggest you look at the algorithm strverscmp uses - indeed it might be that this function will do the job for you.

What this function does is the following. If both strings are equal, return 0. Otherwise find the position between two bytes with the property that before it both strings are equal, while directly after it there is a difference. Find the largest consecutive digit strings containing (or starting at, or ending at) this position. If one or both of these is empty, then return what strcmp(3) would have returned (numerical ordering of byte values). Otherwise, compare both digit strings numerically, where digit strings with one or more leading zeros are interpreted as if they have a decimal point in front (so that in particular digit strings with more leading zeros come before digit strings with fewer leading zeros). Thus, the ordering is 000, 00, 01, 010, 09, 0, 1, 9, 10.

score 2 · Answer 2 · edited May 23 '17 at 12:01

Your examples only show digits, letters, and spaces. So for the moment I'll assume you ignore every other symbol (effectively treat them as spaces). You also seem to want to treat uppercase and lowercase letters as equivalent.

It also appears that you interpret runs of digits as a "term" and runs of letters as a "term", with any transition between a letter and a digit being equivalent to a space. A single space is considered equivalent to any number of spaces.

(Note: You are conspicuously missing an example of what to do in cases like:

"5a" vs "a11"
"a5" vs "11a"

So you have to work out what to do when you face a comparison of a numeric term with a string term. You also don't mention intrinsic equalities...such as should "5 a" == "5a" just because "5 a" < "5b"?)

One clear way of doing this would be turn the strings into std::vector of "terms", and then compare these vectors (rather than trying to compare the strings directly). These terms would be either numeric or string. This might help get you started, especially the STL answer:

how to split a string value that contains characters and numbers

Trickier methods that worked on the strings themselves without making an intermediary will be faster in one-off comparisons. But they'll likely be harder to understand and modify, and perhaps slower if you are going to repeatedly compare the same structures.

A nice aspect of parsing into a structure is that you get an intrinsic "cleanup" of the data in the process. Getting the information into a canonical form is often a goal in programs that are tolerating such a variety of inputs.

Daniel Näslund · Accepted Answer · 2012-04-18T12:09:02.037

I'm assuming that you want the compare to be done in this order: presence of digits in range 1-9; value of digits; number of digits; value of the string after the digits.

It's in C, but you can easily transform it into using the C++ std::string class.

int isdigit(int c)
{
    return c >= '1' && c <= '9';
}

int ndigits(const char *s)
{
    int i, nd = 0;
    int n = strlen(s);

    for (i = 0; i < n; i++) {
        if (isdigit(s[i]))
            nd++;
    }
    return nd;
}

int compare(const char *s, const char *t)
{
    int sd, td;
    int i, j;

    sd = ndigits(s);
    td = ndigits(t);

    /* presence of digits */
    if (!sd && !td)
        return strcasecmp(s, t);
    else if (!sd)
        return 1;
    else if (!td)
        return -1;

    /* value of digits */
    for (i = 0, j = 0; i < sd && j < td; i++, j++) {
        while (! isdigit(*s))
            s++;
        while (! isdigit(*t))
            t++;

        if (*s != *t)
            return *s - *t;
        s++;
        t++;
    }

    /* number of digits */
    if (i < sd)
        return 1;
    else if (j < td)
        return -1;

    /* value of string after last digit */
    return strcasecmp(s, t);
}

@user765443: For ("5 a", "5 b") you have presence of digits, val of digits are equal, number of digits are equal - last line `return strcmp(s, t);` will be evaluated. strcmp compares the remainders of the strings according to the ASCII table. That means we're comparing (0x20 0x61) to (0x20 0x62). The first string is smaller. — Daniel Näslund, Apr 18 '12 at 09:00
Note that this does *not* handle the "a5" "a11" case correctly - was a bit hasty when I wrote it. You'll have to add an inner loop to the `/* value of digits */` part. — Daniel Näslund, Apr 20 '12 at 05:53

score -4 · Answer 4 · edited Jun 13 '12 at 13:12

-4

Try this and read about std::string.compare:

#include <iostream>
using namespace std;


int main(){
    std::string fred = "a5";  
    std::string joe = "a11";

    char x;

    if ( fred.compare( joe ) )
    {
        std::cout << "fred is less than joe" << std::endl;
    }
    else
    {
            std::cout << "joe is less than fred" << std::endl;
    }


    cin >> x;
}

edited Jun 13 '12 at 13:12

answered Apr 18 '12 at 04:45

sien

169
2
5

1

You're not handling the return value of compare() correctly: 0 means equals, otherwise the sign indicates which is larger http://www.cplusplus.com/reference/string/string/compare/ – John Carter Apr 18 '12 at 05:20
1

You didnt answer OP question and provided a bad example – Ulterior Apr 18 '12 at 05:49

compare two alphanumeric string

4 Answers4

Linked