Here's an approach that highly depends on how your html looks like and for which part you want to implement the sentence highlighting. If it's just one text block with multiple lines on top of the page and nothing more, I'd say that you can replace the whole block with an updated HTML block, maybe even a single <p>
.
combining all to one big string
You should find this part in the HTML created by PDF.js, iterate over all the child divs and combine every text part of it to one big string by adding it up all together, just string concatenation. One problem might be the access of the child divs. If the HTML is rendered by an angular application, you can reference DOM elements by giving them attribute names like #textBlock
. Then you can access those elements with @ViewChild
which brings some fancy functions with it to walk down an elements subtree like childNodes
and data
. This may be helpful to extract the text and concatenate the string.
split the text into sentences
Next thing to do is split this big text block string into sentences. Having fixed punctuation marks like . ! ?
we can use someting like a regular expression to split it on the right spot. The string function replace
in combination with a regular expression should do the job here. As a result we want to have an array of sentences. The regex may look something likes this, also I'm not 100% if it works, because I just found it in this answer:
var bigTextBlock="Big text block. No more divs. Only a string";
var sentences = bigTextBlock.match( /[^\.!\?]+[\.!\?]+/g );
remove the current divs
Now that's not too bad for a start. We now want to remove the current divs and create new html tags. There are multiple ways to do this. In both cases we might need to have a reference to the parent div of the text block divs from before, that we probably already have.
First option is to set something like [innerHTML]. This removes the old divs and creates new ones, but gets tricky when you want to implement an onclick action, because this way we bypass angular.
The other way is to manipulate its children through your reference element. For this we can use a so called Renderer2
that is injected as a service. You can do different stuff with it like creating new tags, removing children and also creating onClick listeners on nodes, which is what we probably need to do anyway. For now we only want to remove the old childNodes.
create adjusted html
As we now have every sentence isolated, we can create one big <p>
div that contains a <span>
div for every sentence that we have. This way we can give the span just another css class if the user clicks inside of this text part and therefore having a highlight for every sentence. As stated before the html could be placed through [innerHTML] or by creating them as children of our reference. In both cases we need to use Renderer2
to make the <span>
listen to an onclick action. Here's some code that combines the span creation and adding the listener both through Renderer2
.
@ViewChild('textBlock') textBlock: ElementRef;
constructor(private renderer: Renderer2, private router: Router) { }
createSpans(sentences: string[]){
sentences.forEach(sentence=>{
// create elements
const span = this.renderer.createElement('span');
const spanText = this.renderer.createText(sentence);
// append the sentence to the span div
this.renderer.appendChild(span, spanText);
// append the span div to the parent
this.renderer.appendChild(this.textBlock.nativeElement, span);
// listen to the onClick
this.renderer.listen(span, 'click', (event) => {
// set a highlight class
span.class.highlighted = true;
});
});
}
I know this is a lot to do and it gets tricky at some parts, but this is probably how I would handle it. But again it depends highly on how your HTML currently looks like and how you want it to look like after the changes.
block that combines multiple lines of text?
– Benedikt Schmidt Nov 29 '17 at 01:25block around multiple lines as you said.
– Oskar Martin Nov 29 '17 at 10:00