1

I recently started to learn python. Now i want to strip numbers from a website to sum them up.

Here is my code:

# read data -> extract numbers -> compute sum
import urllib.request, urllib.parse
from bs4 import BeautifulSoup

html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.html')
file = BeautifulSoup(html, 'html.parser')
tags = file('span')
calcs = 0
for tag in tags:
    tag.decode()
    calcs += int(tag.string)
print(calcs)

In line 11 (calcs += ...) i wasn't sure what to do and somewhere in the internet i found .string, which helped me get the numbers out of the lines, but i'm not really sure why this works or what .string does. Couldn't find any source of information about that by myself. If i change .string to .int it gets 'None'

I hope anyone can explain me the use of .string.

Thank you in advance.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
Jens
  • 13
  • 4
  • string is a member variable that is used to extract data from the tag: https://stackoverflow.com/questions/25327693/difference-between-string-and-text-beautifulsoup – Ben Jones Jul 23 '18 at 13:37

2 Answers2

1

You have to convert the tag.string to int

tags = file('span')
calcs = sum([int(tag.string)  for tag in tags])
Sunitha
  • 11,777
  • 2
  • 20
  • 23
0

.string is a member variable of the Tag object. There is no .int member, which is why you would get None when trying to access the value.

What is happening in your calcs=... line is that you are getting the data from the tag as a str, and then converting the str to an int, which seems to be a completely valid way of getting your list of numbers.

Ben Jones
  • 652
  • 6
  • 21