0

I can't seem to understand why using arr[i]!='\0' never does all the removals it's suppose to? I've been using the similar thing as this one

/* This one is suppose to remove all the special characters & 0-9 from all    entries */

void keepAlphaOnly(char line[50])
{
    int i = 0, j;
    for (i = 0; line[i] != '\0'; i++) {
        if (line[i] < 65 || line[i] > 122 || (line[i] > 90 && line[i] < 97)) {
            for (j = i; j <= strlen(line); j++) {
                line[j] = line[j + 1];
            }
            --i;//EDIT: adjust index to avoid skipping next char
        }
    }
}

int main(void) {
    int i, j, N;
    char str[100][50];

    scanf("%d\n", &N);
    for (i = 0; i < N; ++i)
        gets(str[i]);
    for (i = 0; i < N; i++)
        keepAlphaOnly(str[i]);
    for (i = 0; i < N; i++)
        printf("%s\n", str[i]);
    return 0;
}

To remove some characters or doing sorting and searching or removal from the string but it doesn't do it for the whole string, but leaves some behind just as from the original string?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Abhinav S
  • 111
  • 3
  • 12
  • yikes - you are modifying the same string that you are iterating. – Daniel A. White Oct 10 '15 at 12:52
  • The omission of the null terminator aside, your solution doesn't work when you have consecutive non-alpha characters, because you skip every other character after shifting the array. You could turn this into a `while` loop, but the better and simpler solution is Ed Heal's below. – M Oehm Oct 10 '15 at 13:01
  • @MOehm - Thanks for the pat on the back – Ed Heal Oct 10 '15 at 13:04
  • Use the 2 finger approach: keep a separate index for reading and writing back the filtered characters to the same array, and do not forget the final `'\0'` – chqrlie Oct 10 '15 at 13:13
  • @chqrlie - Pitty that was 16 minutes too late – Ed Heal Oct 10 '15 at 13:15
  • @EdHeal: sorry for my top down reading approach... Your answer illustrates my late recommendation. I shall vote it up if you fix the issue for `isalpha` – chqrlie Oct 10 '15 at 13:17
  • There is no issue with `isalpha` – Ed Heal Oct 10 '15 at 13:18
  • @EdHeal: I'm afraid there is... – chqrlie Oct 10 '15 at 13:24
  • You should also check `scanf` return value, N is indeterminate if no value was parsed by `scanf`; then you should check that `N` is no larger than `100`, and you should use `fgets` instead of `gets` to prevent buffer overflow. Also `j` is unused. – chqrlie Oct 10 '15 at 13:43
  • @MOehm That was such an awesome catch. Can't believe I missed that "Skipping indices". and I've tried all my other codes of similar shifts/removal using while loop. But I was making the same skipping mistake there too. Works perfectly now, thanks. – Abhinav S Oct 10 '15 at 14:10
  • @EdHeal Thanks for the code with in-built function too, but I think I somehwat agree with @M Ohem about the unsigned char. Still thanks – Abhinav S Oct 10 '15 at 14:18
  • Technically, there are more details to fix. Ed Heal's loop is a much better approach to your problem. For instance `j <= strlen(line)` reevaluates the string length for each iteration and iterates one time too many. – chqrlie Oct 10 '15 at 14:19

1 Answers1

0

You have not put in the null character to the "new" string.

But you can do

int from = 0, to = 0;
for ( ;line[from]; ++from) {
   if (isalpha(line[from])) {
      line[to] = line[from];
      to++;
    }
 }
 line[to] = 0;
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • 1
    `isalpha(line[from])` is incorrect, it should be `isalpha((unsigned char)line[from])` – chqrlie Oct 10 '15 at 13:15
  • No it is not, neither in C nor in C++. If the `char` type happens to be signed by default (as is very common for many targets), characters beyond ASCII will have negative values, for which the functions from `` will invoke undefined behaviour. – chqrlie Oct 10 '15 at 13:19
  • @chqrlie - Do you have proof of this? BTW - Macros do not give a rats arse about type – Ed Heal Oct 10 '15 at 13:22
  • C11 Section 7.4: *The header declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.* – chqrlie Oct 10 '15 at 13:22
  • Anyway - Why cast it to an `unsigned char` when the parameter should be an `int` – Ed Heal Oct 10 '15 at 13:25
  • To undo the effect of the sign extension in case `char` is signed and its value is negative. – chqrlie Oct 10 '15 at 13:26
  • The top bit is the sign (twos complement). We are not interested in that bit as `isalpha` checks the other bits. (why make negatives positives - you lose information) – Ed Heal Oct 10 '15 at 13:30
  • The glibc goes to great pains trying to minimize adverse side effects of this common mistake by defining the char type tables to have 384 entries with an offset of 128, effectively giving consistent results for values between -128 and 255 inclusive. But this solution still cannot distinguish between legitimate `char` value `'\377'` and the end of file indicator `EOF`, both equal to `-1`. – chqrlie Oct 10 '15 at 13:32
  • Your assumption about the top bit and `isalpha` is incorrect: `isalpha` uses the **value** it receives as an `int`. If you pass a signed `char` with the *top bit* set, the value will be negative (yielding undefined behaviour), whereas if you cast the `char` to `unsigned char`, the value will be in the correct range. – chqrlie Oct 10 '15 at 13:35
  • Are you saying that the conversion from 8 bits to (32/64) bits gets confused with its value. It does have a lot more room. Would you like to experiment on that? – Ed Heal Oct 10 '15 at 13:41
  • No confusion involved ;-) converting signed `char` values to `int` preserves the value. Signed `char` values with the *high bit* set are negative. Converting them to `int` keeps the same range: -128 .. 127 for 8-bit `char`. The functions from `` are not defined for the negative values except `EOF`. This subtle issue is a common source of bugs, potentially fatal and not necessarily reproducible. – chqrlie Oct 10 '15 at 13:48
  • Read this question for a more in depth analysis, and fix the code in your answer: http://stackoverflow.com/questions/5029840/convert-char-to-int-in-c-and-c/5031906#5031906 – chqrlie Oct 10 '15 at 13:50
  • So please explain why the cast to unsigned is required? – Ed Heal Oct 10 '15 at 14:03
  • You must cast a `char` argument as `(unsigned char)` for its value to be constrained within the range of values of type *unsigned char* as mandated by the Standard. – chqrlie Oct 10 '15 at 14:12
  • But the standard requires an int - http://www.cplusplus.com/reference/cctype/isalpha/ - Correct me if I am wrong that has a sign – Ed Heal Oct 10 '15 at 14:15
  • Yes, and the only accepted negative value is EOF. Other values must be positive, which they won't be when signed char is extended to an int. They will be when unsigned char is extended. – Sami Kuhmonen Oct 10 '15 at 14:23
  • @edheal: since `isalpha`'s prototype specifies an `int`, the provided argument will be converted to an `int` automatically. If the argument is a signed char, the value will be sign-extended; if the argument is unsigned, it will not be. As chqrlie says, sign-extending a char with its sign bit set will either produce an out-of-range value or an incorrect value of EOF. – rici Oct 10 '15 at 14:28
  • I am not pretending anything else. The `int` value passed to `isalpha` needs to be in the range `0 .. UCHAR_MAX` or `EOF`. `EOF` itself is negative. `char` values must be converted to `unsigned char` to comply with the specification. Technically, you could write `isalpha((int)(unsigned char)line[from])` but the `(int)` cast is implicit. BTW the example code in the page you link to is incorrect. It does not fail because all characters in `"C++"` happen to be positive in ASCII, but it would fail for EBCDIC where `'C'` is `'\303'`. – chqrlie Oct 10 '15 at 14:28
  • I know it is surprising to discover such ugly details in the language... Type `char` should really be unsigned by default. I love your motto: *Also I think that code should be readable and maintainable. Why make things complicated when there is a simple, readable and understandable solution that you can come back to in a year and understand without having to bang your head against a brick wall?!* – chqrlie Oct 10 '15 at 23:34