Is there a better way to determine multiple ranges of character?

Question

I'm currently writing code in C, which selects symbols and numbers from whole ASCII-available characters. As a beginner of programmer, I usually did

if ((i > 25 && i < 50) || (i > 100 && i < 200)) { contents }

for the variable i being between 25~50, 100~200 (exclusive) to fit the condition.

If I want to set multiple ranges like 32~64(! to @) and 91~96([ to `) and 123~126({ to ~) then would there be any better (meaning shorter or simpler code) or should I stick with this method, keep adding each range as in the code above?

One hint, it would also be possible to use instead of the numbers the ascii symbol — wake-0, Jul 13 '16 at 04:57
Well, I would suggest you define a function `in_range(int min, int max)` and call this function. — ckruczek, Jul 13 '16 at 05:05
Interesting question. I'm not sure if there is an easier way to determine if something is in a particular range, but there might be an easier way to determine if something as ASCII as shown in [this question](http://stackoverflow.com/questions/9234886/check-if-a-char-is-ascii-using-bitmasks-and-bit-operators-in-c) — iRove, Jul 13 '16 at 05:05
I usually use masks: `if ((chr & mask) == cond)` - it allows to select any ranges: all odd symbols (`mask = 0x01; cond = 0x01`), all symbols from fist line of [ASCII table](http://ascii-table.com/index.php) (`mask = 0x0F; cond = 0x00`). By combining this checks I can implement any filter. — imbearr, Jul 13 '16 at 05:09
if you want `! to @` then why don't use `'!'` and `'@'` instead of 32 and 64? Much more readable and precise — phuclv, Jul 13 '16 at 05:24
I'd rewrite your code as: `((25 < i && i < 50) || (100 < i && i < 200))`. Consistently writing this kind of logic in an intuitively understandable way will reduce your cognitive load and make the code easier to read. — Richard, Jul 13 '16 at 05:37
about ` inside code: http://meta.stackexchange.com/questions/82718/how-do-i-escape-a-backtick-in-markdown — bolov, Jul 13 '16 at 06:35
@MatthieuM. Actually this was not for work/business purpose, but I had some consideration of performance because it was about keyboard input automata for Korean Hangul with ascii characters. I believe using new header would take similar or more cost to do the same thing as code in the question, but since machines these days are too fast so I couldn't detect much difference. (Or do the selected answer actually reduces cost?) — Kagamin, Jul 13 '16 at 08:02

score 14 · Accepted Answer · answered Jul 13 '16 at 05:15

14

For your specific case, the <ctype.h> collection of functions would do

if (isprint(i) && !isalpha(i))

Added bonus: It even works on non-ascii systems.

answered Jul 13 '16 at 05:15

a3f

8,517
1
41
46

Hoped I could find the way to do that in other multiple-range required situation, but this is neat enough for now. – Kagamin Jul 13 '16 at 05:21
6

Careful, these are locale-dependent, which may or may not be what is actually needed (if I'm parsing a programming language, for example, typically I don't want it to be locale-dependent). OTOH they work as expected in the "C" locale, which is the only one that you should be using anyway. – Matteo Italia Jul 13 '16 at 05:30

Sergey · Answer 2 · 2016-07-13T05:49:24.580

2

You can write a function that checks if the value belongs to any of given ranges:

struct Range {
        int min;
        int max;
};

bool in_ranges(int character, struct Range *ranges, size_t num_ranges) {
        for(size_t i = 0; i < num_ranges; ++i) {
                if(ranges[i].min < character && character < ranges[i].max)
                        return true;
        }
        return false;
}

int main() {
        struct Range rngs[] = {{25,50}, {100,200}};
        bool at_sign_si_in_range = in_ranges('@', rngs, 2);
        return 0;
}

It makes editing ranges much simpler and improves readability. Also, if you continue to write all ranges in conditional clause as in your example, consider checking ranges like

lower_bound < value && value < upper_bound

It looks like mathematical notation (x < a < y) and also seems easier to read.

edited Jul 13 '16 at 05:49

answered Jul 13 '16 at 05:15

Sergey

7,985
4
48
80

1

Honestly, besides being probably way less efficient, it doesn't look any more readable at all; most importantly, there's no indication whatsoever about whether the ranges are `[,]` (as usually are in common parlance), `[,)` (as usually are in programming) or `(,)` (which are rarely used, but are what is actually needed in this case). That's why I advocate for keeping this kind of logic either extremely explicit or extremely local - hunting off-by-ones is even more difficult if you need to continuously jump around in utility functions. – Matteo Italia Jul 13 '16 at 05:54
1

@MatteoItalia Debugging functions is easier than debugging macros. They are also unsafe: MACRO( i++ , j ), etc... Premature optimization is what...? The code is also very readable and the intent is clear. The section about inclusive/exclusive ranges is a strawman argument. If it *were relevant* at all, you would simply add the suffix inclusive/exlusive to the function name. – 2501 Jul 13 '16 at 06:21
@2501: you are not getting it; the point is very simple: if you are not striving for generality *and* you want terseness, you can use the macro I wrote; it has all the kind of limitations *exactly* because it's extremely ad-hoc, but it's extremely concise and serves exactly that purpose. The `i++` argument applies to any macro, and any half decent C programmer knows better - especially since the fact that it's a macro and that its argument is evaluated multiple times is *extremely clear* because the definition is right there. If instead I'm writing a function like yours, it's because ... – Matteo Italia Jul 13 '16 at 06:42
1

I'm going to leave these two things here: http://stackoverflow.com/q/652788/4082723 and `#define B(l, h) ((l) – 2501 Jul 13 '16 at 06:46
@2501 I'm striving for generality - that is going to stay in some kind of shared header for me to use in future similar computations. Now, I refuse to believe that in your program you'll check only against `(,)` ranges (if anything, the most checked-against range type is `[,)`). So, you'll want to have `in_ranges_inc_inc`, `in_ranges_inc_exc`, `in_ranges_exc_exc` and `in_ranges_exc_inc` or something like that. But now, what are you gaining against typing out the expression in full? You lost in flexibility (the ranges that OP wants to check are actually more easily written with some ... – Matteo Italia Jul 13 '16 at 06:46
@MatteoItalia As I have said this is a strawman argument, you macro suffers from the same problem. – 2501 Jul 13 '16 at 06:48
@2501: mixed ranges), and you replaced familiar syntax (regular C expressions) with stuff that another programmer has to look up. To wrap it up: IMO this is just a false generalization that doesn't yield anything useful, it only adds clutter. If you do want to keep it extremely terse use an ad-hoc macro that does exactly what you need *here* without hoping to reuse it, otherwise just type out the expression in full (which is what I actually do at the bottom of my answer). – Matteo Italia Jul 13 '16 at 06:49

score 1 · Answer 3 · answered Jul 13 '16 at 13:47

If you are using single byte characters, you may be able to get better performance using an array of flags, setting either individual bits or whole bytes to indicate character values that are in one of the ranges.

If you are writing code for an Intel processor that supports the SSE 4.2 instructions, you might want consider using PCMPISTRI or similar, which can compare up to 16 single byte characters against up to 8 different ranges in a single instruction.

score 1 · Answer 4 · answered Jul 13 '16 at 16:35

My answer would be "it depends". :)

If isalpha() and friends from ctype.h do what you want, then absolutely use them.

But if not...

If you only had two ranges, as in your example snippet, I don't think it looks too messy. If there are more, maybe put the range test in an (inline) function to reduce the number of booleans visible at a time:

if (in_range(val, a1, b1) || in_range(val, a2, b2) || ... )

(Or name it B(n,a,b) if you feel the need to save screen estate. )

If the ranges might change in run-time, or there are lots of them, put the limits in a struct and loop through an array of those. If there truly are many, sort the list and do something smart with it, like a binary search over the lower limits (or whatever). But for a small number, I wouldn't bother.

If the total range of allowed values is small (like unsigned chars with values 0..255), but the number of separate "ranges" is large ("all those with prime values"), then make a table (bitmap) of the values, and test against that. Generate the table any way you like. (isalpha() is probably implemented like this)

unsigned char is_prime[256] = {0, 0, 1, 1, 0, 1, 0, 1, 
    ...};

if (is_prime[val]) { ...

Matteo Italia · Answer 5 · 2016-07-13T06:03:27.500

You can hide the duplication of l<x && x<h in a macro or an inline function, but I found that it's rarely worth it - it's not as readable as Python l<x<h syntax, and quickly gets out of hand once you start to have macros for all the inclusive boundaries possibilities. Either you end up with a ridiculously long naming convention (between_inc_inc, between_inc_exc, ... which kinda defeats factoring out the check in first place) or you leave the reader wondering about your range checks ("between(i, 50, 100)... is it a [,) range? a [,] one? (checks the code) nope it's a (,)"), which is terrible if you are hunting off-by-one errors.

OTOH, I'm known to abuse "single letter macros", which I define exactly where and how they are needed, and are undefined immediately after. Although they may look ugly, the point is that they are extremely local and do exactly what needs done, so there's no time wasted in looking them up, there are no cryptic parameters and they can factor out the bulk of repeated computation.

In your case, if the list is significantly long I may do

#define B(l, h) ((l)<i) && (i<(h)) ||

if(B(25,50) B(100,200) B(220, 240) 0)
... 
#undef B

(never do this in a header!)

What instead is a good boost in readability is to use character literals instead of ASCII numbers: for example, if you want the a-z range, do 'a'<=i && i<='z'.

You seem to want to exclude alphabetical and non printable characters: you can do that with

if((' '<=i && i<'A') || (i>'Z' && i<'a') || ('z'<i && i<=126))

Ah, there's the guaranteed downvote you get when talking about macros! — Matteo Italia, Jul 13 '16 at 05:47
I think this is an example of macro abuse in C. Avoid it if you can. Here is a good modern approach: http://stackoverflow.com/a/38343152/4082723 — 2501, Jul 13 '16 at 06:14
@2501: IMO that approach is way worse for the reasons I outlined in the comment above. — Matteo Italia, Jul 13 '16 at 06:24
Your macro will blow up spectacularly when, for example: `B(i++,2), B(i&1,2) , B(i?2:3,1)`, etc.. This is so unsafe it's not even funny. This is macro abuse by definition: `#define B(l, h) ((l) — 2501, Jul 13 '16 at 06:27

score 0 · Answer 6 · answered Jul 13 '16 at 05:11

0

You can write a function like:

bool withinscope(int num, int begin, int end){
    if(num > begin && num < end)
        return true;
    return false;
}

Then you can use this function and keep the code clean and simple.

answered Jul 13 '16 at 05:11

hexiecs

312
1
12

3

Booleans are plain values, not just things to be used if `if` guards. One can directly `return (num > begin && num < end)`. – chi Jul 13 '16 at 09:16

meJustAndrew · Answer 7 · 2016-07-13T10:08:58.657

0

class RangeCollection
{
    std::vector<int> ranges;
public:
    void AddRange(int lowerBound, int upperBound)
    {
        vector.push_back(lowerBound);
        vector.push_back(upperBound);
    }

    bool IsInRange(int num)
    {
        for(int i=0; i<ranges.size()-1; i+=2)
        {
            if(num>ranges[i] && num<ranges[i+1])return true;
        }
     return false;
    }
};

You can call AddRange to add as many ranges as you want, then you can check if a number is in range.

RangeCollection rc;
rc.AddRange(20,25);
rc.IsInRange(22);//returns true

edited Jul 13 '16 at 10:08

answered Jul 13 '16 at 06:27

meJustAndrew

6,011
8
50
76

Should increment the loop 2 by 2. Or use std::pairs. – Guilherme Bernal Jul 13 '16 at 09:39
Using an explicit pair type would be better coding practice, since it makes explicit that these numbers go together and makes it harder to erroneously access numbers from separate pairs together. – zstewart Jul 13 '16 at 11:06
@zstewart sincerly this is what I thougt too, but i didn't took time to update this answer. Indeed making a type which explicitly take these numbers as pair is a net superior practice! – meJustAndrew Jul 13 '16 at 11:10
1

Question asked about C not C++, -1 – cat Jul 13 '16 at 12:38

Is there a better way to determine multiple ranges of character?

7 Answers7