Searched around on SO, but couldn't find anything for this.
I'm scraping using beautifulsoup... This is the code I'm using which I found on SO:
for section in soup.findAll('div',attrs={'id':'dmusic_tracklist_track_title_B00KHQOKGW'}):
nextNode = section
while True:
nextNode = nextNode.nextSibling
try:
tag_name = nextNode.name
except AttributeError:
tag_name = ""
if tag_name == "a":
print nextNode.text()
else:
print "*****"
break
If went to this 50 Cent album (Animal Ambition: An Untamed Desire To Win) and wanted to scrape each song, how would I do so? The problem is each song has a different ID associated with it based upon its product code. For example, here is the XPath of the first two songs' titles: //*[@id="dmusic_tracklist_track_title_B00KHQOKGW"]/div/a/text()
and //*[@id="dmusic_tracklist_track_title_B00KHQOLWK"]/div/a/text()
.
You'll notice the end of the first id is B00KHQOKGW
, while the second is B00KHQOLWK
. Is there a way I can add a "wild card to the end of the id to grab each of the songs no matter what product id is at the end? For example, something like id="dmusic_tracklist_track_title_*
... I replaced the product ID with a *
.
Or can I use a div
to target the title I want like this (I feel like this would be the best. It uses the div's class right above the title. There isn't any product ID in it):
for section in soup.findAll('div',attrs={'class':'a-section a-spacing-none overflow_ellipsis'}):
nextNode = section
while True:
nextNode = nextNode.nextSibling
try:
tag_name = nextNode.name
except AttributeError:
tag_name = ""
if tag_name == "a":
print nextNode.text()
else:
print "*****"
break