The Goose library is, according to its website, a Html content / article extractor in Scala. It's mission is to take any news article or article type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.
The Goose library is, according to its website, a Html content / article extractor in Scala. It's mission is to take any news article or article type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.
It was open sourced from Gravity Labs.