0

I have this JavaScript sourcecode from a website.

<script>"@context": "http://schema.org/","@type": "Product","name": "Shower head","image": "https://example.com/jpeg.png","description": "Hello stackoverflow","url": "link.com","offers": {"@type": "Offer","priceCurrency": "USD","price": "10.00","itemCondition": "http://schema.org/NewCondition","availability": "http://schema.org/InStock","url": "MyUrl.com","availableAtOrFrom": {"@type": "Place","name": "Geneva, NY","geo": {"@type": "GeoCoordinates","latitude": "42.8361","longitude": "-76.9874"}},"seller": {"@type": "Person","name": "Edward"}}}</script>

And I'm trying to use this JSoup code to extract the last line with "name": "Edward"

public class JsoupCrawler {
    public static void main(String[] args) {
        try {
            Document doc = Jsoup.connect("https://example.com").userAgent("mozilla/17.0").get();
            Elements temp = doc.select("script.name");
            int i=0;
            for (Element nameList:temp) {
              i++;
              System.out.println(i+  " "+ nameList.getElementsByTag(" ").first().text() );
            } 
        }  
        catch (IOException e) {
            ex.printStackTrace();  
        } 
    }
}

Can somebody help me with this, or is impossible?

pushkin
  • 9,575
  • 15
  • 51
  • 95
Edward
  • 1
  • 1

1 Answers1

1

JSoup is interpreting HTML. The Contents of the <script> element contain JavaScript, so JSoup can't interpret what is inside the <script> element.

It looks as if the content of the <script> element is formatted in JSON. So you could use JSoup to get to the content of the <script> element, and then try to feel this string into a JSON interpreting library. Look here if you want to dive into that: How to parse JSON in Java

If this is a one-off and you can trust that the contents of the <script> element do not change too much, you may also use regular expressions to get to the desired part. However, I would recommend using a JSON library.

luksch
  • 11,497
  • 6
  • 38
  • 53