Parsing remote web with Python BeautifulSoup

Question

https://stackoverflow.com/a/64983/468251 - Hello, I have question about this code, how made that working with remote website url, and how got value = fooId['value'] from all inputs, no only from first?

You can post your request for information on that answer. Don't post a new question here. Add a comment to the existing answer. — S.Lott, Jan 12 '12 at 15:26

score 2 · Answer 1 · answered Aug 03 '17 at 03:14

When you parse url on the internet, you need to find a way to download the page content html first. There are great libraries, like requests, which is said to be best for python. Say you want to parse https://stackoverflow.com/

import requests
response = requests.get("https://stackoverflow.com/")
page_html = response.text

The page_html is the page html in python string, then you can treat it like a local html file, and preform any kind of parsing on them.

As for getting all the occurrence of a pattern, you can do soup.findAll('input',name='fooId',type='hidden'), instead of just soup.find(). The soup.findAll will return a list of all occurrence.

score 1 · Accepted Answer · answered Jan 12 '12 at 15:29

1

The example use a local file. If you want to use a remote site, you need to download the file from the server and parse the html.

You can look at request or urllib2 for this.

I hope it helps

answered Jan 12 '12 at 15:29

luc

41,928
25
127
172

import urllib2 urllib2.urlopen('http://...').read() work, but how take elements from soup.findAll (there is example with soup.find)?:) – Rambo Jan 12 '12 at 15:37
from doc: The find method is almost exactly like findAll, except that instead of finding all the matching objects, it only finds the first one. – luc Jan 12 '12 at 15:52

Parsing remote web with Python BeautifulSoup

2 Answers2