0

I have an android app that retrieves and downloads audio from the web.

What I did so far is to get the download link shown on WebView using webview.loadUrl("some javascript code").

What I want to do is get the href attribute of the download link and store it as a String.

But I have two problems:

First, there are several anchor tags with no id or class that are all under the div with id "dl_links". All those anchor tags have an href attribute but all of them except the correct download link have display:none. I have no idea how to select this without using jQuery.

Secondly, since the download link is loaded with JavaScript, the url of the website before and after the showing the download link are the same. At first I was planning to use jsoup to pull out the href attribute that I needed but since the url after loading the webpage is the same, I'm not sure how to do that.

Sufian
  • 6,405
  • 16
  • 66
  • 120
June
  • 55
  • 2
  • 9

2 Answers2

1

I don't understand your second part of your statement regarding not being able to use jSoup because the URL is the same? Can you explain that better?

In any rate to pull the links out using jSoup is really easy.

Document doc = Jsoup.parse(pageHTML);


    Elements pageLinks = doc.select("div#dl_links a");

    ArrayList<String> theLinks;
    theLinks= new ArrayList<String>(pageLinks.size());
    if (pageLinks.size() > 0) {
        for (Element lnk : theLinks) {
             if (lnk.attr("style").contains("display:none"))
             {
            theLinks.add(lnk.attr("href"));
              }
        }
    }

EDIT

You could also probably shorten the results by doing something like...

 doc.select("div#dl_links a[style*=display:none]")

EDIT 2

Since your needing to get the information after a javascript press then your going to want to do something like this...

  WebView.loadUrl("javascript:(function() { document.querySelectorAll(\"button[type='submit']\")[0].click();})()");

The above would click the item that needs to be clicked so the new HTML will be shown. You might need to do a pause or thread sleep to make sure the new text has been shown. The program is the WebView doesn't have a great way of allowing you to just read the new HTML, so you will need to look into those ways if you need to wait until that specific text is on the page if it takes a while for it to load on the page after pressing the button.

This isn't an "easy" task. You would want to refer to this page for ideas and concepts: how to get html content from a webview?

Once your done returning the HTML, then you would just do the jSoup over the page's HTML.

Community
  • 1
  • 1
eqiz
  • 1,521
  • 5
  • 29
  • 51
  • Thank you so much! So what i mean is after loading the page, the website url stays the same. When I load the page to get the download link, the page content changes with javascript so everything else stays the same and only the download link is added to html. And this is a website that already exists so I can't change anything. – June Mar 10 '15 at 18:53
  • If you mean you can't change the page's javascript, then yes you would want to use jSoup since you just load the HTML after the page has been fully loaded then you can just pull the results without having to do any javascript on the page. – eqiz Mar 10 '15 at 20:26
  • Please mark this as answer or upvote if it helped you out for others. – eqiz Mar 10 '15 at 22:30
  • I don't think I'm stating my problem correctly.. What I meant was that when the page is loaded with javascript, the url of the page doesn't change therefore if I use jSoup, i can't get the download link because it's not there. the download link only appears after I click a button and the javascript code is executed.. – June Mar 10 '15 at 23:58
0

var dlHolder = document.getElementById('dl_links').querySelectorAll('a');

for (var i = 0; i < dlHolder.length; i++) {
  if (dlHolder[i].tagName == 'A') {
    if (dlHolder[i].style.display === 'none') {
      alert(dlHolder[i].getAttribute('href'));
    }
  }
}
.hidden {
  display: none;
}
<div id="dl_links">
  <a href="#1">a1</a>
  <a href="#2">a2</a>
  <a href="#3">a3</a>

  <a href="#4" class="hidden">a4</a> <!-- doesn't work -->

  <a href="#5" style="display:none">a5</a> <!-- this should work -->
</div>

<a href="#6" style="display:none">a6</a> <!-- outside div -->
VenomVendor
  • 15,064
  • 13
  • 65
  • 96