0

I have a this html:

<html>
<head>
<title>Try jsoup</title>
</head>
<body class="sin">
<div class="ks">
    <div class="wrap">

        <div class="mag-right-sidebar-wrap">
            <main class="mag">

                //A lot of unneeded tags

                <article class="post-1989009 post type-post post" itemscope="" itemtype="http://schema.org/CreativeWork">
                    <header class="post-header">
                        <h1 class="post-title" itemprop="headline">Knowledge nay</h1>
                        <img src="https://ohniee.com/wp-mag/uploads/avatars/1/djsy8933e89ufio8389e8-author-img.jpg" class="avatar user-1-avatar avatar-40 photo" width="40" height="40" alt="Profile photo of Johnnie Adams">

                        <div class="flip-meta" style="padding-top:3px; margin-left: 50px">
lorem ipsum <a href="/members/iyke"><span class="flip-author" itemprop="author" itemscope itemtype="http://schema.org/Person"><span class="flip-author-name" itemprop="name"> Johnnie Adams</span></span></a> <script>
document.write(" on June 1st, 2005 00:99 ")</script>  .  <span class="flip-comments-link"><a href="https://ohniee.com/lorem-ipsum">25 Comments</a></span>
</div>
                    </header>

                    //A lot of unneeded tags
</body>
</html>

and I am trying to extract lorem ipsum Johnnie Adams on June 1st, 2005 00:99 from it. But what I am getting is lorem ipsum Johnnie Adams . 25 Comments.

Please, how do I get lorem ipsum Johnnie Adams on June 1st, 2005 00:99 from the html?

This is the code I am using

document.select("div.flip-meta").first().text();

Jsoup demo link: https://try.jsoup.org/~BAit4PmvqNcdVAKLBv4Yp4QrXYQ

Roseyk
  • 97
  • 9
  • Possible duplicate of [Java - Obtain text within script tag using Jsoup](http://stackoverflow.com/questions/16780517/java-obtain-text-within-script-tag-using-jsoup) – Stephan May 16 '16 at 00:40

1 Answers1

1

Modifying Stephens answer,

Element script = document.select("div.flip-meta script").first();
if (script==null) {
    throw new RuntimeException("script element not found");
}

String scriptContent = script.html().replace("document.write(\"", "").replace("\")", "");

String text1 = document.select("div.flip-meta").first().text();
String text2 = text1.replaceAll("\\s*[.?!].*","");

String finaltext = text2 + scriptContent;

urTextView.setText(finaltext);

This should get you lorem ipsum Johnnie Adams on June 1st, 2005 00:99

X09
  • 3,827
  • 10
  • 47
  • 92
  • Worked but what does `("\\s*[.?!].*","")` mean? – Roseyk May 15 '16 at 01:03
  • It's a regex. It'll select the space before the point and any text that comes after the point. So `replaceAll("\\s*[.?!].*","")` selects ** . 25 Comments** and changes it to empty text. – X09 May 15 '16 at 01:09