3

I want to get the link that is after "S-1", instead of the one after "S-1/A". I tried ".find_all(lambda tag: tag.name == 'td' and tag.get()==['S-1'])", tried ".select('td.s-1')", and failed to get the link. I appreciate any help on it.

Here is the relevant page source:

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1/A</td>
        <td>10/31/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_0" href="/markets/ipos/filing.ashx?filingid=1567309" target="_blank">Filing</a>
        </td>
    </tr>

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1</td>
        <td>9/27/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_1" href="/markets/ipos/filing.ashx?filingid=921318" target="_blank">Filing</a>
        </td>
    </tr>

Here is the screenshot of relevant page source:

Relevant Page Source

Here is the link of the full page source:

https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials

Lina
  • 33
  • 4

1 Answers1

1

Try this:

from bs4 import BeautifulSoup
import requests    

def getlink(url):
    response = requests.get(url)
    mainpage = BeautifulSoup(response.text, 'html5lib')
    table = mainpage.findAll('table', attrs={"class": "marginB10px"})
    links = table[1].findAll('a')
    return links[1].get('href')    

link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)

output:

https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318
Ishara Madhawa
  • 3,549
  • 5
  • 24
  • 42
  • 1
    Many thanks, Ishara! It works for the link I shared with you perfectly. – Lina Jun 08 '18 at 02:19
  • However, when I apply the same method to other links, I found some issues. The index [1] changes with the links (firms). Some "S-1" form is indexed by [1], some by [2], some by [3], and so on. I see one way to identify the S-1 form is by matching exactly "S-1/". Do you know if I can locate them by matching the name, say "S-1" forms? Alternatively, "S-1" form is always the last link if it helps. I cannot find a way to count the number of findings. Do you have some tricks? – Lina Jun 08 '18 at 02:31
  • 1
    If it the last link always then instead of `links[1]` use `links[-1]`. It returns the last element of the list links. – Ishara Madhawa Jun 08 '18 at 02:36
  • Thank you very much for your timely reply! It solves the issue well. For my future reference, would you please point me to the document about the index rule you used? Excuse me for all the basic questions, I've been learning Python for two weeks and am still navigating through the documentations/resources to learn Python. – Lina Jun 08 '18 at 02:48
  • 1
    https://stackoverflow.com/questions/930397/getting-the-last-element-of-a-list-in-python in stack overflow you may find solutions for many issues. Here is the answer for the index rule I used. – Ishara Madhawa Jun 08 '18 at 02:52
  • 1
    Got it. I'm glad that I joined stack overflow! – Lina Jun 08 '18 at 03:00