So my code is for a program that reads tags in an html file and displays them along with the count of how many times they have occurred. For this problem, a tag is considered to be one that begins immediately after '<' with alphanumeric name and terminates with either '>' or a space.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define MAX_TAG_LEN 10
#define MAX_TAGS 100
void htagsA3()
{
char c;
int within_tag = 0;
char tagName[MAX_TAG_LEN];
int tagNameLen = 0;
char tags[MAX_TAGS][MAX_TAG_LEN]; //stores tag names
int tagCounts[MAX_TAGS]; //stores count of each tag
int numOfTags = 0;
while((c = getchar()) != EOF)
{
if(c == '<')
{
within_tag = 1;
tagNameLen = 0;
}
else if(c == '>' || c == ' ')
{
within_tag = 0;
tagName[tagNameLen] = '\0';
int i;
for(i=0; i<numOfTags; i++)
{
if(strcmp(tags[i], tagName) == 0)
{
tagCounts[i]++;
break;
}
}
if(i == numOfTags)
{
strncpy(tags[numOfTags], tagName, MAX_TAG_LEN);
tagCounts[numOfTags] = 1;
numOfTags++;
}
}
else if(within_tag)
{
while(c != ' ')
{
if(isalnum(c) && tagNameLen < MAX_TAG_LEN)
{
tagName[tagNameLen] = c;
tagNameLen++;
}
}
}
}
printf("HTML Tags Found:\n");
int i;
for(i=0; i<numOfTags; i++)
{
printf("%s: %d\n", tags[i], tagCounts[i]);
}
}
int main()
{
htagsA3();
}
I want to be able to add the tag name until a space is seen so I used while(c != ' '). When I compile and run this, the cmd gets stuck on a blank line. Without the while loop, the program works fine but displays the right tag name but the count is wrong as the tag counter is incremented even in spaces and I only want to count how many times a particular tag has appeared. I am using input redirection to input an html file to the program when running. Please help me find the errors.
Here is a sample output:
HTML Tags Found:
body: 4
div: 2
p: 6
b: 4
span: 16
The correct count should actually be:
body 1 div 1 p 2 b 2 span 2
Here is the content of the sample html file inputted:
<body lang=EN-CA link=blue vlink="#954F72">
<div class=WordSection1>
<p class=MsoNormal><b><span lang=EN-US style='font-size:14.0pt;font-family: "Times New Roman",serif'>CS 2263</span></b></p>
<p class=MsoNormal><b><span lang=EN-US style='font-size:14.0pt;font-family: "Times New Roman",serif'>Assignment 1</span></b></p>