I'm trying to build an app that takes the URL of an article and transforms the article from that URL into a readable text. Until now i managed to take the entire HTML from that URL, but I was wondering how can I take only the text of the article. I am using newsapi.org to take the URL. Any suggestions?
Asked
Active
Viewed 38 times
0
-
Did you try parsing only sections of the DOM? Did you see if there are any open source libraries that help with this task? – bright-star Feb 26 '17 at 01:00
-
I tried to find any open source library to help me, but I haven't found anything yet. If you could help me with something, it would be amazing. – Ciocirlan Cosmin Gabriel Feb 26 '17 at 23:46
-
I am trying to create something like Safari's Reader Mode. I only want the important parts of the html. – Ciocirlan Cosmin Gabriel Feb 28 '17 at 21:08
-
Sounds good. You're going to have to take a library like the one listed in the duplicate, parse the HTML, and extract the parts of the DOM you want to keep. Give it a try. – bright-star Feb 28 '17 at 21:23
-
Thanks for the advice! I am going to try it and tell you if it works. – Ciocirlan Cosmin Gabriel Feb 28 '17 at 21:41
-
You should indicate in the UI that this question is a duplicate before you go. – bright-star Feb 28 '17 at 21:45