method for expand a-z to abc...xyz form

Question

Hi:) what i'm trying to do is write a simple program to expand from shortest entry

for example

a-z or 0-9 or a-b-c or a-z0-9

to longest write

for example

abc...xyz or 0123456789 or abc or abcdefghijklmnouprstwxyz0123456789

1-st examle shortest entry = 1-st example result which should give:)

so far i write something like this and it's work only for letters from a to z:

expand(char s[])
{
   int i,n,c;
   n=c=0;
   int len = strlen(s);
   for(i = 1;s[i] > '0' && s[i]<= '9' || s[i] >= 'a' && s[i] <= 'z' || s[i]=='-';i++)
   {
      /*c = s[i-1];
      g = s[i];
      n = s[i+1];*/
      if( s[0] == '-')
          printf("%c",s[0]);
      else if(s[i] == '-')
      {
         if(s[i-1]<s[i+1])
         {
             while(s[i-1] <= s[i+1])
             {
                 printf("%c", s[i-1]);
                 s[i-1]++;
             }
         }
         else if(s[i-1] == s[i+1])
             printf("%c",s[i]);
         else if(s[i+1] != '-')
             printf("%c",s[i]);
         else if(s[i-1] != '-')
             printf("%c",s[i]);
    }
    else if(s[i] == s[i+1])
    {

      while(s[i] == s[i+1])
      {
           printf("%c",s[i]);
           s[i]++;
      }
    }
    else if( s[len] == '-')
         printf("%c",s[len]);
 }

}

but now i'm stuck:(

any ideas what should i check to my program work correctly?

Edit1: @Andrew Kozak (1) abcd (2) 01234

Thanks for advance:)

For any sequence with a dash/hyphen, are you supposed to expand that to output all characters in the range? E.g. "a-d" --> "abcd" and "g-l" --> "ghijkl" ? — Andrew Kozak, Nov 02 '11 at 18:46
What would a-b-c result in? I'd guess it would simply result into ab-c OR a-bc? — sehe, Nov 02 '11 at 18:48
i wrote it o up of my question it should be simple abc without '-' symbol — Harry89pl, Nov 02 '11 at 18:50
You should strongly consider putting parenthesis around the different logical components in the main for-loop. The compiler may not be interpreting it like you want it to. Also, its very hard to read. — Akron, Nov 02 '11 at 18:53
@Akron it working for shorter expressions for example a-z or a-g problem is when i try expand expression like this a-z0-9 program expand only expression a-z in this case — Harry89pl, Nov 02 '11 at 19:01
`printf("%c", character)` can be shortened to `putchar(character)`. — zwol, Nov 02 '11 at 19:09
i can use only printf, for,while,switch,if statements for this exercise:( — Harry89pl, Nov 02 '11 at 19:11
It seems to me that there might be a type issue at play when you are making your comparisons in that long if() condition. What output do you actually get for "0-4" ? — Andrew Kozak, Nov 02 '11 at 19:16
for 0-4 i got 01234 like i want but problem is when i got 2 shortest expressions like (1)a-z0-9 or (2)0-9a-z in this case method work only for (1) a-z expression don't check and expand 0-9 and (2) 0-9 expression don't check a-z (1) output is abcdefghijklmnouprstwxyz and fo (2) is 0123456789 i can't findout what condition should i check for make it work:( — Harry89pl, Nov 02 '11 at 19:28
@harry180: I've managed a C++ _and_ a pure C solution with some torture tests and a bit of explanation. See **[my answer](http://stackoverflow.com/questions/7985661/method-for-expand-a-z-to-abc-xyz-form/7988413#7988413)**, of course, or head straight to http://ideone.com/sXM7b#info_3915048 to see the C version running live! — sehe, Nov 02 '11 at 23:40
@harry180: **Update** I published the **[benchmark code](https://gist.github.com/1335390#file_benchmark_expand_algorithms.cpp)** and **[the results](https://gist.github.com/1335390#gistcomment-60733)**: the C version is roughly 3.7x faster than the C++ version for 1024k iterations of all three test cases (see [here](http://ideone.com/sXM7b#info_3915048)) — sehe, Nov 03 '11 at 00:25

score 4 · Accepted Answer · edited May 23 '17 at 12:15

Here is a C version (in about 38 effective lines) that satisfies the same test as my earlier C++ version.

The full test program including your test cases, mine and some torture test can be seen live on http://ideone.com/sXM7b#info_3915048

Rationale

I'm pretty sure I'm overstating the requirements, but

this should be an excellent example of how to do parsing in a robust fashion
- use states in an explicit fashion
- validate input (!)
  - this version doesn't assume a-c-b can't happen
  - It also doesn't choke or even fail on simple input like 'Hello World' (or (char*) 0)
it shows how you can avoid printf("%c", c) each char without using extraneous functions.
I put in some comments as to explain what happens why, but overall you'll find that the code is much more legible anyways, by
- staying away from too many short-named variables
- avoiding complicated conditionals with un-transparent indexers
- avoiding the whole string length business: We only need max lookahead of 2 characters, and *it=='-' or predicate(*it) will just return false if it is the null character. Shortcut evaluation prevents us from accessing past-the-end input characters
ONE caveat: I haven't implemented a proper check for output buffer overrun (the capacity is hardcoded at 2048 chars). I'll leave it as the proverbial exercise for the reader

Last but not least, the reason I did this:

It will allow me to compare raw performance of the C++ version and this C version, now that they perform equivalent functions. Right now, I fully expect the C version to outperform the C++ by some factor (let's guess: 4x?) but, again, let's just see what suprises the GNU compilers have in store for us. ~~More later~~ Update turns out I wasn't far off: github (code + results)

Pure C Implementation

Without further ado, the implementation, including the testcase:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int alpha_range(char c) { return (c>='a') && (c<='z'); }
int digit_range(char c) { return (c>='0') && (c<='9'); }

char* expand(const char* s)
{
    char buf[2048];

    const char* in  = s;
          char* out = buf;

    // parser state
    int (*predicate)(char) = 0; // either: NULL (free state), alpha_range (in alphabetic range), digit_range (in digit range)
    char lower=0,upper=0;       // tracks lower and upper bound of character ranges in the range parsing states

    // init
    *out = 0;

    while (*in)
    {
        if (!predicate)
        {
            // free parsing state
            if (alpha_range(*in) && (in[1] == '-') && alpha_range(in[2]))
            {
                lower = upper = *in++;
                predicate = &alpha_range;
            }
            else if (digit_range(*in) && (in[1] == '-') && digit_range(in[2]))
            {
                lower = upper = *in++;
                predicate = &digit_range;
            } 
            else *out++ = *in;
        } else
        { 
            // in a range
            if (*in < lower) lower = *in;
            if (*in > upper) upper = *in;

            if (in[1] == '-' && predicate(in[2])) 
                in++; // more coming
            else
            {
                // end of range mode, dump expansion
                char c;
                for (c=lower; c<=upper; *out++ = c++);
                predicate = 0;
            }
        }
        in++;
    }

    *out = 0; // null-terminate buf
    return strdup(buf);
}

void dotest(const char* const input)
{
    char* ex = expand(input);
    printf("input : '%s'\noutput: '%s'\n\n", input, ex);

    if (ex)
        free(ex);
}

int main (int argc, char *argv[])
{
    dotest("a-z or 0-9 or a-b-c or a-z0-9"); // from the original post
    dotest("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6"); // from my C++ answer
    dotest("-x-s a-9 9- a-k-9 9-a-c-7-3"); // assorted torture tests

    return 0;
}

Test output:

input : 'a-z or 0-9 or a-b-c or a-z0-9'
output: 'abcdefghijklmnopqrstuvwxyz or 0123456789 or abc or abcdefghijklmnopqrstuvwxyz0123456789'

input : 'This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6'
output: 'This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678'

input : '-x-s a-9 9- a-k-9 9-a-c-7-3'
output: '-stuvwx a-9 9- abcdefghijk-9 9-abc-34567'

ludo · Answer 2 · 2011-11-02T19:36:23.740

Ok I tested your program out and it seems to be working for nearly every case. It correctly expands a-z and other expansions with only two letters/numbers. It fails when there are more letters and numbers. The fix is easy, just make a new char to keep the last printed character, if the currently printed character matches the last one skip it. The a-z0-9 scenario didn't work because you forgot a s[i] >= '0' instead of s[i] > '0'. the code is:

#include <stdio.h>
#include <string.h>

void expand(char s[])
{
        int i,g,n,c,l;
    n=c=0;
    int len = strlen(s);
    for(i = 1;s[i] >= '0' && s[i]<= '9' || s[i] >= 'a' && s[i] <= 'z' || s[i]=='-';i++)
    {
        c = s[i-1];
        g = s[i];
        n = s[i+1];
        //printf("\nc = %c g = %c n = %c\n", c,g,n);
        if(s[0] == '-')
            printf("%c",s[0]);
        else if(g == '-')
        {
            if(c<n)
            {
                if (c != l){
                    while(c <= n)
                    {
                        printf("%c", c);
                        c++;
                    }
                    l = c - 1;
                //printf("\nl is %c\n", l);
            }
            else
            {
                c++;
                while(c <= n)
                {
                    printf("%c", c);
                    c++;
                }
                l = c - 1;
                //printf("\nl is %c\n", l);
            }
        }
        else if(c == n)
            printf("%c",g);
        else if(n != '-')
            printf("%c",g);
        else if(c != '-')
            printf("%c",g);
    }
    else if(g == n)
    {

        while(g == n)
        {
            printf("%c",s[i]);
            g++;
        }
    }
    else if( s[len] == '-')
        printf("%c",s[len]);
    }
    printf("\n");
}

int main (int argc, char *argv[])
{
    expand(argv[1]);
}

Isn't this problem from K&R? I think I saw it there. Anyway I hope I helped.

firstly thanks for help:) secondly: it work when expression is 0-9a-z but if it's a-z0-9 it work only feom a-z and don't expand 0-9 expression — Harry89pl, Nov 02 '11 at 19:36
i really don't know what and how i should thisfixed for a-z0-9 i should check if s[i] >= s[i+1] or s[i-1]>= s[i]? — Harry89pl, Nov 02 '11 at 21:02
s[i] >= 0 in the for loop instead of s[i] > 0. The code I posted works, just look it over. — ludo, Nov 02 '11 at 22:52
Here's a torture test for you: `"-x-s a-9 9- a-k-9 9-a-c-7-3"`. Your code returns '- - -' in this case. I prefer `-stuvwx a-9 9- abcdefghijk-9 9-abc-34567`. See http://ideone.com/sXM7b#info_3915048 — sehe, Nov 02 '11 at 23:38

score 1 · Answer 3 · answered Nov 02 '11 at 19:35

1

Based on the fact that the existing function addresses "a-z" and "0-9" sequences just fine, separately, we should explore what happens when they meet. Trace your code (try printing each variable's value at each step -- yes it will be cluttered, so use line breaks), and I believe you will find a logical short-circuit when iterating, for example, from "current token is 'y' and next token is 'z'" to "current token is 'z' and next token is '0'". Explore the if() condition and you will find that it does not cover all possibilities, i.e. you have covered yourself if you are within a<-->z, within 0<-->9, or exactly equal to '-', but you have not considered being at the end of one (a-z or 0-9) with your next character at the start of the next.

answered Nov 02 '11 at 19:35

Andrew Kozak

1,631
2
22
35

If you don't mind stepping back from the current methodology, you may find that you can approach this problem differently. If the input string contains a raw character, you are simply going to print it, and you are unconcerned with the previous or next tokens in the a-z or 0-9 sequence. E.g. "adyz" outputs "adyz". That said, you are really just looking for '-'. Could you imagine a simpler method that did something like "if( curr != '-' ){ output curr; }else{ determine range from prev and next and loop through range; }? If so, we could explore that type of solution. – Andrew Kozak Nov 02 '11 at 19:39
I think I managed to implement you idea - without realizing it - [here`.`](http://stackoverflow.com/questions/7985661/method-for-expand-a-z-to-abc-xyz-form/7988413#7988413) – sehe Nov 02 '11 at 23:11

score 1 · Answer 4 · answered Nov 02 '11 at 21:58

Just for fun, I decided to demonstrate to myself that C++ is really just as suited to this kind of thing.

Test-first, please

First, let me define the requirements a little more strictly: I assumed it needs to handle these cases:

int main()
{
    const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
    std::cout << "input : " << in         << std::endl;
    std::cout << "output: " << expand(in) << std::endl;
}

input : This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6
output: This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678

C++0x Implementation

Here is an implementation (actually a few variants) in 14 lines _{^{(23 including whitespace, comments)}} of C++0x code¹

static std::string expand(const std::string& in)
{
    static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");

    std::string out;

    auto tail = in.begin();
    for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
    {
        out.append(tail, match[0].first);

        // char range bounds: the cost of accepting unordered ranges...
        char a=127, b=0;
        for (auto x=match[0].first; x<match[0].second; x+=2)
            { a = std::min(*x,a); b = std::max(*x,b); }

        for (char c=a; c<=b; out.push_back(c++));
        tail = match.suffix().first;
    }
    out.append(tail, in.end());

    return out;
}

Of course I'm cheating a little because I'm using regex iterators from Boost. I will do some timings comparing to the C version for performance. I rather expect the C++ version to compete within a 50% margin. But, let's see what kind of surprises the GNU compiler ahs in store for us :)

Here is a complete program that demonstrates the sample input. _It also contains some benchmark timings and a few variations that trade-off

functional flexibility
legibility / performance

#include <set> // only needed for the 'slow variant'
#include <boost/regex.hpp>
#include <boost/range.hpp>

using namespace boost;
using namespace boost::range;

static std::string expand(const std::string& in)
{
//  static const regex re(R"([a-z]-[a-z]|[0-9]-[0-9])"); // "a-c-d" --> "abc-d", "a-c-e-g" --> "abc-efg"
    static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");

    std::string out;
    out.reserve(in.size() + 12); // heuristic

    auto tail = in.begin();
    for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
    {
        out.append(tail, match[0].first);

        // char range bounds: the cost of accepting unordered ranges...
#if !SIMPLE_BUT_SLOWER
        // debug 15.149s / release 8.258s (at 1024k iterations)
        char a=127, b=0;
        for (auto x=match[0].first; x<match[0].second; x+=2)
        { a = std::min(*x,a); b = std::max(*x,b); }

        for (char c=a; c<=b; out.push_back(c++));
#else   // simpler but slower
        // debug 24.962s / release 10.270s (at 1024k iterations)
        std::set<char> bounds(match[0].first, match[0].second);
        bounds.erase('-');
        for (char c=*bounds.begin(); c<=*bounds.rbegin(); out.push_back(c++));
#endif
        tail = match.suffix().first;
    }
    out.append(tail, in.end());

    return out;
}

int main()
{
    const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
    std::cout << "input : " << in         << std::endl;
    std::cout << "output: " << expand(in) << std::endl;
}

_{¹ Compiled with g++-4.6 -std=c++0x}

Weehoo after trying to run the existing implementations it turns out I have interpreted the _specs_ way stricter (I mean, way, way stricter) than the OP and the other answers. That basically means that all performance comparisons are going to be stranded. That's a pity. Someone up for a nice _robust_ implementation in C? — sehe, Nov 02 '11 at 22:20
Nevermind, I did the **[pure C version myself (here)](http://stackoverflow.com/questions/7985661/method-for-expand-a-z-to-abc-xyz-form/7988413#7988413)**. I think your tutor would be a bit surprised if you handed it in like that - but I merely intended it to show you some high-level approaches that are commonly used in parsing text with a hand-crafted text parser like this. — sehe, Nov 02 '11 at 23:13
**Update** I published the **[benchmark code](https://gist.github.com/1335390#file_benchmark_expand_algorithms.cpp)** and **[the results](https://gist.github.com/1335390#gistcomment-60733)**: the C version is roughly 3.7x faster than the C++ version for 1024k iterations of all three test cases (see [here](http://ideone.com/sXM7b#info_3915048)) — sehe, Nov 03 '11 at 00:25

score 0 · Answer 5 · answered May 02 '21 at 10:13

This is a Java implementation. It expands the character ranges similar to 0-9, a-z and A-Z. Maybe someone will need it someday and Google will bring them to this page.

package your.package;

public class CharacterRange {

    /**
     * Expands character ranges similar to 0-9, a-z and A-Z.
     * 
     * @param string a string to be expanded
     * @return a string
     */
    public static String expand(String string) {

        StringBuilder buffer = new StringBuilder();

        int i = 1;
        while (i <= string.length()) {
            final char a = string.charAt(i - 1); // previous char
            if ((i < string.length() - 1) && (string.charAt(i) == '-')) {
                final char b = string.charAt(i + 1); // next char
                char[] expanded = expand(a, b);
                if (expanded.length != 0) {
                    i += 2; // skip
                    buffer.append(expanded);
                } else {
                    buffer.append(a);
                }
            } else {
                buffer.append(a);
            }
            i++;
        }

        return buffer.toString();
    }

    private static char[] expand(char a, char b) {
        char[] expanded = expand(a, b, '0', '9'); // digits (0-9)
        if (expanded.length == 0) {
            expanded = expand(a, b, 'a', 'z'); // lower case letters (a-z)
        }
        if (expanded.length == 0) {
            expanded = expand(a, b, 'A', 'Z'); // upper case letters (A-Z)
        }
        return expanded;
    }

    private static char[] expand(char a, char b, char min, char max) {

        if ((a > b) || !(a >= min && a <= max && b >= min && b <= max)) {
            return new char[0];
        }

        char[] buffer = new char[(b - a) + 1];
        for (int i = 0; i < buffer.length; i++) {
            buffer[i] = (char) (a + i);
        }
        return buffer;
    }

    public static void main(String[] args) {

        String[] ranges = { //
                "0-9", "a-z", "A-Z", "0-9a-f", "a-z2-7", "0-9a-v", //
                "0-9a-hj-kmnp-tv-z", "0-9a-z", "1-9A-HJ-NP-Za-km-z", //
                "A-Za-z0-9", "A-Za-z0-9+/", "A-Za-z0-9-_" };

        for (int i = 0; i < ranges.length; i++) {

            String input = ranges[i];
            String output = CharacterRange.expand(ranges[i]);

            System.out.println("input:  " + input);
            System.out.println("output: " + output);
            System.out.println();
        }
    }
}

Output:

input:  0-9
output: 0123456789

input:  a-z
output: abcdefghijklmnopqrstuvwxyz

input:  A-Z
output: ABCDEFGHIJKLMNOPQRSTUVWXYZ

input:  0-9a-f
output: 0123456789abcdef

input:  a-z2-7
output: abcdefghijklmnopqrstuvwxyz234567

input:  0-9a-v
output: 0123456789abcdefghijklmnopqrstuv

input:  0-9a-hj-kmnp-tv-z
output: 0123456789abcdefghjkmnpqrstvwxyz

input:  0-9a-z
output: 0123456789abcdefghijklmnopqrstuvwxyz

input:  1-9A-HJ-NP-Za-km-z
output: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

input:  A-Za-z0-9
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

input:  A-Za-z0-9+/
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

input:  A-Za-z0-9-_
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

method for expand a-z to abc...xyz form

5 Answers5

Rationale

Pure C Implementation

Test-first, please

C++0x Implementation

Linked