For Clarity
Read on, understanding where I am coming from:
- I have 0 web scraping experience.
- I know NOT what to Google for, in regards to my specific question.
- When I say non-conventional I mean it in the sense that it’s not plain English, and it’s buried pretty deep in the markup.
- The tutorial I am following—and many like it—only show how to scrape when the site is ‘plain-English’ like.
My Dilemma
I am trying to scrape reddit. I am currently following a tutorial to grab the Game of Thrones subreddit.
This is the code I am seeing when I pull the request using scrapy (confirmed same with web browser ‘inspect element ):
I was reading something about XML markup or something, but I am pretty lost.
Is there a way to look specifically for the h2
code considering that’s where the title is housed? I’m sure I can take that ideology and apply it to whatever other elements I am looking for...