I'm trying to parse/print some data from twitter. I have a code that prints tweets but when I try to apply the same code to the usernames it does not seem to work. I'm wanting to do this without having to use the twitter API.
Here is what I have that prints the tweets
def main():
try:
sourceCode = opener.open('https://twitter.com/search?f=realtime&q='\
+keyWord+'&src=hash').read()
splitSource = re.findall(r'<p class="js-tweet-text tweet-text">(.*?)</p>', sourceCode)
print len(splitSource)
print splitSource
for item in splitSource:
print '\n _____________________\n'
print re.sub(r'<.*?>','',item)
except Exception, e:
print str(e)
print 'error in main try'
time.sleep(555)
main()
Now to print the Username info I changed the "opener" to "browser" but it will still find and open the page so that's not the issue. I don't think anyway.
def main():
try:
pageSource = browser.open('https://twitter.com/search?q='\
+firstName+'%20'+lastName+'&src=hash&mode=users').read()
splitSource = re.findall(r'<p class="bio ">(.*?)</p>', pageSource)
for item in splitSource:
print '\n'
print re.sub(r'<.*?>','',item)
except Exception, e:
print str(e)
print 'error in main try'
main()
It will print the sourceCode all right. The issue seems to be with the:
splitSource = re.findall(r'<p class="bio ">(.*?)</p>', pageSource)
This doesn't seem to find anything at all. Here is a copy of the source I am trying to pull the info from.
<div class="content">
<div class="stream-item-header">
<a class="account-group js-user-profile-link" href="/BarackObama" >
<img class="avatar js-action-profile-avatar " src="https://pbs.twimg.com/profile_images/451007105391022080/iu1f7brY_normal.png" alt="" data-user-id="813286"/>
<strong class="fullname js-action-profile-name">Barack Obama</strong><span class="Icon Icon--verified Icon--small"><span class="u-isHiddenVisually">Verified account</span></span>
<span class="username js-action-profile-name">@BarackObama</span>
</a>
</div>
<p class="bio ">
This account is run by Organizing for Action staff. Tweets from the President are signed -bo.
</p>
</div>
I feel like there is something going on in this source that is preventing me from getting the bio info. The spacing maybe? I duno.