I'm trying to get information from many different tables from an HTML url without any of the HTML indent/tab formatting. I use get_text to generate the content I want, but it prints with a lot of white space and tabs. I've tried .strip and that doesn't accomplish what I want.
Here's the python script I'm using:
import csv, simplejson, urllib,
url="http://www.thecomedystudio.com/schedule.html"
response=urllib.urlopen(url)
from bs4 import BeautifulSoup
html = response
soup = BeautifulSoup(html.read())
text = soup.get_text()
print text
In the end, I'd like to create a csv of the event calendar, but first I'd like to create a .txt or something that doesn't require too much manual cleaning.
Any help appreciated.