Best way to get value from html in python?

Question

So i want to scrape attribute value in python and currently i'm using regex but its not that effective so i wanted to know what should i use instead since many says that regex is bad for such thing.

Thanks

This is what i try to get.

<input type="hidden" name="test" value="99948555">

value always contains random numbers.

I would check out HTMLParser (https://docs.python.org/2/library/htmlparser.html) — K Richardson, Oct 08 '16 at 17:08

Loïc · Accepted Answer · 2016-10-08T19:29:27.877

3

I would use BeautifulSoup for this kind of parsing :

from bs4 import BeautifulSoup
html = '<input type="hidden" name="test" value="99948555">'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('input')['name'], ':', soup.find('input')['value'])
# outputs : "test : 99948555"

What you are looking for here is : soup.find('input')['value']

See the documentation for usage and examples : https://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can install it like this :

[python_binary] -m pip install bs4

edited Oct 08 '16 at 19:29

answered Oct 08 '16 at 17:13

Loïc

11,804
1
31
49

`pip install beautifulsoup4`, actually – OneCricketeer Oct 08 '16 at 17:35
@cricket_007 depends on the version of python you are using. On centOS for instance I have both python2.7 and python3.4, and using pip as a module works everytime, while pip binary only works for one version of python. So yes, I recommend using pip as a module. – Loïc Oct 08 '16 at 17:56
Well i read documentation and got it working but i still can't retrieve only the value :/ i was thinking about turning result into string and regex numbers – rookiedude Oct 08 '16 at 19:10
Here's the code: soup = BeautifulSoup(data, "lxml") hidden_tags = soup.findAll("input", {'name': "test"}) print(hidden_tags) – rookiedude Oct 08 '16 at 19:12
I've edited my answer : just use `soup.find('input')['value']` When using `findAll()` you'll get a list, not a single object. – Loïc Oct 08 '16 at 19:30
1

Thanks mate edited and it works just fine :) – rookiedude Oct 08 '16 at 19:48

Best way to get value from html in python?

1 Answers1