0

For Clarity

Read on, understanding where I am coming from:

  1. I have 0 web scraping experience.
  2. I know NOT what to Google for, in regards to my specific question.
  3. When I say non-conventional I mean it in the sense that it’s not plain English, and it’s buried pretty deep in the markup.
  4. The tutorial I am following—and many like it—only show how to scrape when the site is ‘plain-English’ like.

My Dilemma

I am trying to scrape reddit. I am currently following a tutorial to grab the Game of Thrones subreddit.

This is the code I am seeing when I pull the request using scrapy (confirmed same with web browser ‘inspect element ):

Scrapy Code Pull

I was reading something about XML markup or something, but I am pretty lost.

Is there a way to look specifically for the h2 code considering that’s where the title is housed? I’m sure I can take that ideology and apply it to whatever other elements I am looking for...

PythonReactor
  • 483
  • 4
  • 18
  • What is non-conventional with this one? I think the only thing is that `class` names are not in English language. – Selcuk Mar 08 '19 at 03:30
  • I’ll be fair, the way I meant non-conventional, was not portrayed well in that title. I meant it as in ‘not plain English and easy to get to’ – PythonReactor Mar 08 '19 at 03:35
  • If this looks too cumbersome you can always use Reddit's documented API. – Selcuk Mar 08 '19 at 03:42
  • I don’t want to use Reddit’s API for various reasons, hence trying to use Scrapy. More so, because I am trying to learn in the event that I come across this same situation somewhere else, and have no access to an API. @ivan_pozdeev I will check that one out, thank you. – PythonReactor Mar 08 '19 at 03:54

0 Answers0