I am iterating through a list of links to obtain all obama's speeches. However, for some links, their html format is like the following:
<p><font face="Verdana, Arial, Helvetica, sans-serif" size="3">If
there is anyone out there who still doubts that America is a place
where all things are possible; who still wonders if the dream of
our founders is alive in our time; who still questions the power
of our democracy, tonight is your answer.</font></p>
<p><font face="Verdana, Arial, Helvetica, sans-serif" size="3">It’s
the answer told by lines that stretched around schools and churches
in numbers this nation has never seen; by people who waited three
hours and four hours, many for the very first time in their lives,
because they believed that this time must be different; that their
voice could be that difference.</font></p>
<p><font face="Verdana, Arial, Helvetica, sans-serif" size="3">It’s
the answer spoken by young and old, rich and poor, Democrat and
Republican, black, white, Latino, Asian, Native American, gay, straight,
disabled and not disabled – Americans who sent a message to
the world that we have never been a collection of Red States and
Blue States: we are, and always will be, the United States of America.</font></p>
And if I do soup.find_all('font')
, I only get one of the paragraphs but not the whole passage. However, for other links, their html format may look like the text below,which soup.find_all('font')
returns the whole passage to me.
</font></strong><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><br/>
</font></font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><br/>
My fellow citizens:<br/>
<br/>
I stand here today humbled by the task before us, grateful for the
trust you have bestowed, mindful of the sacrifices borne by our ancestors.
I thank President Bush for his service to our nation, as well as the
generosity and cooperation he has shown throughout this transition.<br/>
<br/>
Forty-four Americans have now taken the presidential oath. The words
have been spoken during rising tides of prosperity and the still waters
of peace. Yet, every so often the oath is taken amidst gathering clouds
and raging storms. At these moments, America has carried on not simply
because of the skill or vision of those in high office, but because
We the People have remained faithful to the ideals of our forbearers,
and true to our founding documents.<br/>
<br/>
So it has been. So it must be with this generation of Americans.<br/>
</font> <div align="left">
Goal: I want to obtain the entire speech not just paragraphs. How can I achieve this using beautifulsoup in python ?
These two speeches come from:
http://obamaspeeches.com/P-Obama-Inaugural-Speech-Inauguration.htm