I've been trying to figure out why .select("div.zn-body__paragraph") for jsoup hasn't been working on certain CNN articles. For articles like this it doesn't work despite clearly having that tag, whereas an article like this works. Here's the complete code I've written:
public static String getContentCNN(String link) throws IOException{
String finalString = "";
Elements paragraphs = getDocsCNN(link).select("div.zn-body__paragraph");
for (Element p : paragraphs) {
finalString += p.text() + "\n\n";
}
return finalString;
}
They both have divider classes like this:
<div class="zn-body__paragraph">Nadler on Wednesday said he didn't know the White House's motives, but he would not allow the White House to try to claim that the President cannot be held accountable.</div>
<div class="zn-body__paragraph">"I don't know whether they're trying to taunt us toward an impeachment or anything else," Nadler said. "All I know is they have made a preposterous claim."</div>
So far, I've tried div#class, div[class] & getElementByClass("class")
Thanks.
EDIT: Here is the source code for getDocsCNN():
public static Document getDocsCNN(String link) throws IOException{
return Jsoup.connect(link).userAgent("Mozilla").timeout(6000).get();
}