I have an input file with the following text.
<html>
<head><title>My web page</title></head>
<body>
<p>Foo bar<br />
Hi there!<br />
How is it going?
</p>
<p>
I'm fine. And you?
</p>
<p>
Here is a <a href="somelink.html">link</a> to follow.
</p>
</body>
</html>
I am tasked with removing the html tags and if <br />
output one \n
and if it is <p>
"output two \n
. My code works fine. Except it is counting </p>
as a <p>
and I do not want to output a \n for <\p>. I have been racking my brain for the last hour thinking of a way to account for this and I cannot.
Might someone offer a suggestion on accounting for this.
void main(){
FILE *ifp, *ofp;//input/output file pointers
int c, c1, c2;//variables used to store and compare input characters, c2 is used only to check for a </p> tag
int n = 0;
int count = 0;//counter for total characters in file
int putCount = 0;//counter for number of outputted characters
int discardTag = 0; //counter to store number of discarded tags
float charDiff = 0;//variable to store file size difference
int br = 0; //counter for <br />
int p = 0;//counter for <p>
ifp = fopen("prog1in1.txt", "r");
ofp = fopen("prog1in1out.txt", "w");
do{
c = getc(ifp);
count ++;
//compares the current character to '<' if its found starts a while loop
if(c == '<'){
//loops until it reaches the end of the tag
while( c != '>'){
count ++;
c = getc(ifp);
/*compares the first two characters to determine if it is a <br /> tag
if true outputs a null line and counts the number of <br /> tags*/
if(c == 'b' ){
c = getc(ifp);
count ++;
if( /*c == 'b' &&*/ c == 'r'){
br ++;
c = '\n';
putc( c , ofp);
count += 1;
}
}//end br if
/*else if if the tag is <p> outputs two null lines
and counts the number of <p> tags*/
else if ( c == 'p' ){
p ++;
c = '\n';
putc( c ,ofp);
putc( c, ofp);
count +=2;
}//end p if
//counts the number of tags that are not <br />
else{ //if ( c2 != 'b' && c1 != 'r' || c1 != 'p'){
discardTag ++;
}// end discard tag
}//end while
}
/*checks if the current character is not '>'
if true outputs the current character*/
if( c != '>'){
putc( c , ofp);
putCount++;
}
else if( c == EOF){
//does nothing here yet
}
}while(c != EOF);
fclose(ifp);
}//end main
` and `` and the "br" in ``. You might want to read the entire "word" right after `<` to prevent that (this also solves your current problem).
– Jongware Feb 09 '14 at 23:01