Print specific line (Beautifulsoup)

Question

Currently, my code is parsing through the link and printing all of the information from the website. I only want to print a single specific line from the website. How can I go about doing that?

Here's my code:

from bs4 import BeautifulSoup
import urllib.request

r = urllib.request.urlopen("Link goes here").read()
soup = BeautifulSoup(r, "html.parser")

# This is what I want to change. I currently have it printing everything.
# I just want a specific line from the website

print (soup.prettify())

What line do you want? There are much better and accurate ways than using prettify, splitting the lines and indexing is going to break with the smallest change to the html — Padraic Cunningham, Jun 16 '16 at 15:37
I want a line that says this each time: E9-WAREHOUSE the "E9-WAREHOUSE" may be different each time — Harrison, Jun 16 '16 at 16:11
What other tags are around it? can you share the link or the html? — Padraic Cunningham, Jun 16 '16 at 16:37

score 4 · Accepted Answer · answered Jun 16 '16 at 13:09

4

li = soup.prettify().split('\n')
print str(li[line_number-1])

answered Jun 16 '16 at 13:09

kulan

191
8

THANK YOU! Worked beautifully! – Harrison Jun 16 '16 at 13:13

score 3 · Answer 2 · edited May 23 '17 at 12:24

3

Don't use pretty print to try and parse tds, select the tag specifically, if the attribute is unique then use that, if the class name is unique then just use that:

td = soup.select_one("td.content")
td = soup.select_one("td[colspan=3]")

If it was the fourth td:

td = soup.select_one("td:nth-of-type(4)")

If it is in a specific table, then select the table and then find the td in the table, trying to split the html into lines and indexing is actually worse than using a regex to parse html.

You can get the specific td using the text from the bold tag preceding the td i.e Department of Finance Building Classification::

In [19]: from bs4 import BeautifulSoup

In [20]: import urllib.request

In [21]: url = "http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?boro=1&houseno=1&street=park+ave&go2=+GO+&requestid=0"

In [22]: r = urllib.request.urlopen(url).read()

In [23]: soup = BeautifulSoup(r, "html.parser")

In [24]: print(soup.find("b",text="Department of Finance Building Classification:").find_next("td").text)
O6-OFFICE BUILDINGS

Pick the nth table and row:

In [25]: print(soup.select_one("table:nth-of-type(8) tr:nth-of-type(5) td[colspan=3]").text)
O6-OFFICE BUILDINGS

edited May 23 '17 at 12:24

Community

1
1

answered Jun 16 '16 at 16:31

Padraic Cunningham

176,452
29
245
321

then would i use use print(td) after that? cause I called that and it's printing out "none". here's my current code: http://puu.sh/puY3G/bf07dffd63.png – Harrison Jun 16 '16 at 16:39
Posted wrong snippet, can you add the actual link or html and it iwll be a lot easier show you how to parse html properly. – Padraic Cunningham Jun 16 '16 at 16:41
Here's the link: http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?boro=1&houseno=1&street=park+ave&go2=+GO+&requestid=0 – Harrison Jun 16 '16 at 16:43
In that case I would want to print "O6-OFFICE BUILDINGS" – Harrison Jun 16 '16 at 16:49
I was messing around and put td = soup.select_one("td:nth-of-type(135)"), then call print(td.get_text()), which works, but only like 40% of the time. I'm just baffled. – Harrison Jun 16 '16 at 17:37
I will be back on my notebook soon and I will show you how to get it – Padraic Cunningham Jun 16 '16 at 17:59
Are you always looking for `Department of Finance Building Classification:`? – Padraic Cunningham Jun 16 '16 at 18:26
No, i'm just looking for the "06 - Office Buildings" – Harrison Jun 16 '16 at 19:34

Print specific line (Beautifulsoup)

2 Answers2