1

So i want to scrape attribute value in python and currently i'm using regex but its not that effective so i wanted to know what should i use instead since many says that regex is bad for such thing.

Thanks

This is what i try to get.

<input type="hidden" name="test" value="99948555"> 

value always contains random numbers.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
rookiedude
  • 31
  • 1
  • 4

1 Answers1

3

I would use BeautifulSoup for this kind of parsing :

from bs4 import BeautifulSoup
html = '<input type="hidden" name="test" value="99948555">'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('input')['name'], ':', soup.find('input')['value'])
# outputs : "test : 99948555"

What you are looking for here is : soup.find('input')['value']

See the documentation for usage and examples : https://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can install it like this :

[python_binary] -m pip install bs4
Loïc
  • 11,804
  • 1
  • 31
  • 49
  • `pip install beautifulsoup4`, actually – OneCricketeer Oct 08 '16 at 17:35
  • @cricket_007 depends on the version of python you are using. On centOS for instance I have both python2.7 and python3.4, and using pip as a module works everytime, while pip binary only works for one version of python. So yes, I recommend using pip as a module. – Loïc Oct 08 '16 at 17:56
  • Well i read documentation and got it working but i still can't retrieve only the value :/ i was thinking about turning result into string and regex numbers – rookiedude Oct 08 '16 at 19:10
  • Here's the code: soup = BeautifulSoup(data, "lxml") hidden_tags = soup.findAll("input", {'name': "test"}) print(hidden_tags) – rookiedude Oct 08 '16 at 19:12
  • I've edited my answer : just use `soup.find('input')['value']` When using `findAll()` you'll get a list, not a single object. – Loïc Oct 08 '16 at 19:30
  • 1
    Thanks mate edited and it works just fine :) – rookiedude Oct 08 '16 at 19:48