I am trying to extract a snippet out of a sourcecode from a website and now I want to delete all the spaces and tabs before the tags in each line. So I copied the string to a char and now I am checking each character with isspace (also tried '\t' and ' ') each line till there are some other chars like '<' doesn't matter which one while counting how much spaces and tabs there are. Subsequently I create another char and write the separator(line) to it but there I just skip the spaces (with [chars+i]). This method works pretty good but the problem is if there are more than 5 tabs then it just don't work properly. I have absolutely no idea where the fault is.
for(int i = 0;i < lines;i++){
getline(codefile, buf);
char *separator = new char[buf.size()+1];
separator[buf.size()] = 0;
memcpy(separator,buf.c_str(),buf.size());
int chars = 0;
for(int j = 0; j <= sizeof(separator); j++){
if(isspace(separator[j])){
chars++;
}
else{
break;
}
}
char *newbuf= new char[buf.size()-chars+1];
newbuf[buf.size()-chars] = 0;
for(int k = 0; k <= buf.size()-chars+1; k++){
newbuf[k] = separator[chars+k];
}
if(i > lcounter){
cout << newbuf << i << endl;
}
}
Here is the snippet of the sourcecode from the website. You can see it at the image tag, at the closing figure tag and the p tag. They have more than 5 tabs (sorry I had to censor it).
<div class="xxx">
<article class="xxx" data-id="0">
<a href="link" class="tile" style="background-image:url('x.jpg');background-position:left center" data-more="<a href=x" data-clicks="<i class="fa fa-eye"></i>" data-teaserimg="x.jpg">
<time datetime="2015">
<span>2015</span>
</time>
<h1 class="title">
<span>x</span>
</h1>
<div class="x">x</div>
<div class="x">x</div>
<div class="x">
<figure class="x">
<img src="x.jpg" width="1" height="1" alt="">
</figure>
<p>
<strong>x</strong>xxx
</p>
</div>
</a>
Sorry I can't post a picture and I hope it is understandable.