3

I'm currently using the following code to scan each word in a text file, put it into a variable then do some manipulations with it before moving onto the next word. This works fine, but I'm trying to remove all characters that don't fall under A-Z / a-z. e.g if "he5llo" was entered I want the output to be "hello". If I can't modify fscanf to do it is there way of doing it to the variable once scanned? Thanks.

while (fscanf(inputFile, "%s", x) == 1)
Grijesh Chauhan
  • 57,103
  • 20
  • 141
  • 208
user2254988
  • 33
  • 1
  • 3
  • That `fscanf` has one big problem: it is potential buffer overrun. You should always use for example `fscanf(inputFile, "%99s", x)` when you have `char x[100]`. – hyde Apr 07 '13 at 17:01

5 Answers5

3

You can give x to a function like this. First simple version for sake of understanding:

// header needed for isalpha()
#include <ctype.h>

void condense_alpha_str(char *str) {
  int source = 0; // index of copy source
  int dest = 0; // index of copy destination

  // loop until original end of str reached
  while (str[source] != '\0') {
    if (isalpha(str[source])) {
      // keep only chars matching isalpha()
      str[dest] = str[source];
      ++dest;
    }
    ++source; // advance source always, wether char was copied or not
  }
  str[dest] = '\0'; // add new terminating 0 byte, in case string got shorter
}

It will go through the string in-place, copying chars which match isalpha() test, skipping and thus removing those which do not. To understand the code, it's important to realize that C strings are just char arrays, with byte value 0 marking end of the string. Another important detail is, that in C arrays and pointers are in many (not all!) ways same thing, so pointer can be indexed just like array. Also, this simple version will re-write every byte in the string, even when string doesn't actually change.


Then a more full-featured version, which uses filter function passed as parameter, and will only do memory writes if str changes, and returns pointer to str like most library string functions do:

char *condense_str(char *str, int (*filter)(int)) {

  int source = 0; // index of character to copy

  // optimization: skip initial matching chars
  while (filter(str[source])) {
    ++source; 
  }
  // source is now index if first non-matching char or end-of-string

  // optimization: only do condense loop if not at end of str yet
  if (str[source]) { // '\0' is same as false in C

    // start condensing the string from first non-matching char
    int dest = source; // index of copy destination
    do {
      if (filter(str[source])) {
        // keep only chars matching given filter function
        str[dest] = str[source];
        ++dest;
      }
      ++source; // advance source always, wether char was copied or not
    } while (str[source]);
    str[dest] = '\0'; // add terminating 0 byte to match condenced string

  }

  // follow convention of strcpy, strcat etc, and return the string
  return str;
}

Example filter function:

int isNotAlpha(char ch) {
    return !isalpha(ch);
}

Example calls:

char sample[] = "1234abc";
condense_str(sample, isalpha); // use a library function from ctype.h
// note: return value ignored, it's just convenience not needed here
// sample is now "abc"
condense_str(sample, isNotAlpha); // use custom function
// sample is now "", empty

// fscanf code from question, with buffer overrun prevention
char x[100];
while (fscanf(inputFile, "%99s", x) == 1) {
  condense_str(x, isalpha); // x modified in-place
  ...
}

reference:

Read int isalpha ( int c ); manual:

Checks whether c is an alphabetic letter.
Return Value:
A value different from zero (i.e., true) if indeed c is an alphabetic letter. Zero (i.e., false) otherwise

hyde
  • 60,639
  • 21
  • 115
  • 176
  • 1
    @RandyHoward If you think its wrong suggest how one should respond Instead.. hyde don't know whether OP asking for homework or for self learning purpose. hyde just helping. – Grijesh Chauhan Apr 07 '13 at 17:03
  • @hyde I would like to suggest that always explain your code so that it would help to OP better .. – Grijesh Chauhan Apr 07 '13 at 17:06
  • Cheers for answering, although I don't fully understand the example you have given so I'll struggle to use it for my approach. – user2254988 Apr 07 '13 at 17:27
  • @user2254988 I modified the code to use index instead of pointer arithmetic. Is it any clearer now? – hyde Apr 07 '13 at 17:44
  • @user2254988 does it helps now? ask your doubt if you have and be sure that you understands completely... – Grijesh Chauhan Apr 07 '13 at 17:49
  • 1
    +1 - and a small set of changes makes this function much more general. Instead of hard-coding it to use `isalpha()`, pass it a pointer to a function (with the same prototype as `isalpha()` and other `ctype.h` character classification functions) and you can easily use this to filter on any class of characters, even a custom class of characters: `compress_str( char* str, int (*filter)(int))` – Michael Burr Apr 07 '13 at 17:55
  • Good Hyde now it's quite helpful. – Grijesh Chauhan Apr 07 '13 at 17:56
  • @MichaelBurr Good Idea! but for OP I think we should keep answer simple..he is already not much clear. – Grijesh Chauhan Apr 07 '13 at 17:58
  • ah thanks much clearer now, didn't know about the \0 ending char arrays. So now I have to just pass the temporary scanned word to that function and add a return in there so the new alpha string can be returned and used. – user2254988 Apr 07 '13 at 18:07
  • @user2254988 Yeah, it modifies the string you pass in-place. It's safe operation, as result string will be either of equal length or shorter. Would making the filter function parameterised be good, as comment of MichaelBurr suggests? – hyde Apr 07 '13 at 18:10
  • @MichaelBurr Yeah, I considered making the validation function as parameter, but decided against it for sake of simplicity... I guess I could edit the answer to contain two versions, simple and full-featured. – hyde Apr 07 '13 at 18:11
  • just realized it doesn't need a return as it's a pointer, still trying to get to grips with them! No this is exactly what I needed thanks as I'm trying to keep it as compact and simple as possible. – user2254988 Apr 07 '13 at 18:25
  • @user2254988 I went and edited the answer anyway, now it has simple version, and more advanced and better optimized version with parameterized filter function (if you want the previous, it's still in the edit history of the answer). – hyde Apr 07 '13 at 18:29
1

luser droog answer will work, but in my opinion it is more complicated than necessary.

foi your simple example you could try this:

while (fscanf(inputFile, "%[A-Za-z]", x) == 1) {   // read until find a non alpha character
   fscanf(inputFile, "%*[^A-Za-z]"))  // discard non alpha character and continue
}
Jonatan Goebel
  • 1,107
  • 9
  • 14
0

you can use the isalpha() function checking for all the characters contained into the string

gipi
  • 2,432
  • 22
  • 25
0

The scanf family functions won't do this. You'll have to loop over the string and use isalpha to check each character. And "remove" the character with memmove by copying the end of the string forward.

Maybe scanf can do it after all. Under most circumstances, scanf and friends will push back any non-whitespace characters back onto the input stream if they fail to match.

This example uses scanf as a regex filter on the stream. Using the * conversion modifier means there's no storage destination for the negated pattern; it just gets eaten.

#include <stdio.h>
#include <string.h>

int main(){
    enum { BUF_SZ = 80 };   // buffer size in one place
    char buf[BUF_SZ] = "";
    char fmtfmt[] = "%%%d[A-Za-z]";  // format string for the format string
    char fmt[sizeof(fmtfmt + 3)];    // storage for the real format string
    char nfmt[] = "%*[^A-Za-z]";     // negated pattern

    char *p = buf;                               // initialize the pointer
    sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));  // initialize the format string
    //printf("%s",fmt);
    while( scanf(fmt,p) != EOF                   // scan for format into buffer via pointer
        && scanf(nfmt) != EOF){                  // scan for negated format
        p += strlen(p);                          // adjust pointer
        sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));   // adjust format string (re-init)
    }
    printf("%s\n",buf);
    return 0;
}
luser droog
  • 18,988
  • 3
  • 53
  • 105
0

I'm working on a similar project so you're in good hands! Strip the word down into separate parts.

Blank spaces aren't an issue with cin each word You can use a

 if( !isPunct(x) )

Increase the index by 1, and add that new string to a temporary string holder. You can select characters in a string like an array, so finding those non-alpha characters and storing the new string is easy.

 string x = "hell5o"     // loop through until you find a non-alpha & mark that pos
 for( i = 0; i <= pos-1; i++ )
                                    // store the different parts of the string
 string tempLeft = ...    // make loops up to and after the position of non-alpha character
 string tempRight = ... 
simpleb
  • 53
  • 5