0

So I use PDF.js to render pdf to html. On top there is a text layer. What I want to implement is that when you click on a sentence there will be a class added to this sentence.And I want to do this in Angular 4 Component. I have stumbled upon a problem here because the pdf is rendered to html by lines(every line is in a different div).

Example of pdf in html:

<div style="left: 86.0208px; top: 481.589px; font-size: 8.03709px; font-
family: serif; transform: scaleX(1.00581);">
  timestamp server to generate computational proof of the chronological 
  order of transactions.  The
</div>
<div style="left: 86.0208px; top: 490.899px; font-size: 8.03709px; font-
family: serif; transform: scaleX(0.9335);">
  system   is   secure   as   long  
  as   honest   nodes   collectively   control   more   CPU   
  power   than   any
</div>

Any idea how should I implement this functionality? Main goal is to highlight the exact sentence what is clicked and doing it by manipulating html.

R. Richards
  • 24,603
  • 10
  • 64
  • 64
Oskar Martin
  • 73
  • 2
  • 12
  • From your code example I'd say that currently a generated div can contain parts of multiple sentences. So the way the html looks like you can't just style the surrounding div because then parts of another sentence may be highlighted, right? – Benedikt Schmidt Nov 28 '17 at 00:24
  • This is exactly the problem I am facing. I have thought about making a function that checks previous and next element if necessary for finding the dots, but I am not quite sure if that is the right way to do it. Any help how to implement the sentence finding would be very much appreciated. – Oskar Martin Nov 28 '17 at 07:38
  • is it crucial that the div structure created by PDF.js stays the same or may it be possible to update the div, for example creating a big

    block that combines multiple lines of text?

    – Benedikt Schmidt Nov 29 '17 at 01:25
  • It is not crucial that the layout says the same thus you can update the html like so that u add

    block around multiple lines as you said.

    – Oskar Martin Nov 29 '17 at 10:00

1 Answers1

0

Here's an approach that highly depends on how your html looks like and for which part you want to implement the sentence highlighting. If it's just one text block with multiple lines on top of the page and nothing more, I'd say that you can replace the whole block with an updated HTML block, maybe even a single <p>.

combining all to one big string

You should find this part in the HTML created by PDF.js, iterate over all the child divs and combine every text part of it to one big string by adding it up all together, just string concatenation. One problem might be the access of the child divs. If the HTML is rendered by an angular application, you can reference DOM elements by giving them attribute names like #textBlock. Then you can access those elements with @ViewChild which brings some fancy functions with it to walk down an elements subtree like childNodes and data. This may be helpful to extract the text and concatenate the string.

split the text into sentences

Next thing to do is split this big text block string into sentences. Having fixed punctuation marks like . ! ? we can use someting like a regular expression to split it on the right spot. The string function replace in combination with a regular expression should do the job here. As a result we want to have an array of sentences. The regex may look something likes this, also I'm not 100% if it works, because I just found it in this answer:

var bigTextBlock="Big text block. No more divs. Only a string";
var sentences = bigTextBlock.match( /[^\.!\?]+[\.!\?]+/g );

remove the current divs

Now that's not too bad for a start. We now want to remove the current divs and create new html tags. There are multiple ways to do this. In both cases we might need to have a reference to the parent div of the text block divs from before, that we probably already have.

First option is to set something like [innerHTML]. This removes the old divs and creates new ones, but gets tricky when you want to implement an onclick action, because this way we bypass angular.

The other way is to manipulate its children through your reference element. For this we can use a so called Renderer2 that is injected as a service. You can do different stuff with it like creating new tags, removing children and also creating onClick listeners on nodes, which is what we probably need to do anyway. For now we only want to remove the old childNodes.

create adjusted html

As we now have every sentence isolated, we can create one big <p>div that contains a <span> div for every sentence that we have. This way we can give the span just another css class if the user clicks inside of this text part and therefore having a highlight for every sentence. As stated before the html could be placed through [innerHTML] or by creating them as children of our reference. In both cases we need to use Renderer2 to make the <span> listen to an onclick action. Here's some code that combines the span creation and adding the listener both through Renderer2.

@ViewChild('textBlock') textBlock: ElementRef;

constructor(private renderer: Renderer2, private router: Router) { }

createSpans(sentences: string[]){
    sentences.forEach(sentence=>{
        // create elements
        const span = this.renderer.createElement('span');
        const spanText = this.renderer.createText(sentence);
        // append the sentence to the span div
        this.renderer.appendChild(span, spanText);
        // append the span div to the parent
        this.renderer.appendChild(this.textBlock.nativeElement, span);
        // listen to the onClick
        this.renderer.listen(span, 'click', (event) => {
            // set a highlight class
            span.class.highlighted = true;
        });
    });
}

I know this is a lot to do and it gets tricky at some parts, but this is probably how I would handle it. But again it depends highly on how your HTML currently looks like and how you want it to look like after the changes.

Benedikt Schmidt
  • 2,178
  • 16
  • 20