How could I scrape a website with random, non-basic markup code?

Question

For Clarity

Read on, understanding where I am coming from:

I have 0 web scraping experience.
I know NOT what to Google for, in regards to my specific question.
When I say non-conventional I mean it in the sense that it’s not plain English, and it’s buried pretty deep in the markup.
The tutorial I am following—and many like it—only show how to scrape when the site is ‘plain-English’ like.

My Dilemma

I am trying to scrape reddit. I am currently following a tutorial to grab the Game of Thrones subreddit.

This is the code I am seeing when I pull the request using scrapy (confirmed same with web browser ‘inspect element ):

I was reading something about XML markup or something, but I am pretty lost.

Is there a way to look specifically for the h2 code considering that’s where the title is housed? I’m sure I can take that ideology and apply it to whatever other elements I am looking for...

What is non-conventional with this one? I think the only thing is that `class` names are not in English language. — Selcuk, Mar 08 '19 at 03:30
I’ll be fair, the way I meant non-conventional, was not portrayed well in that title. I meant it as in ‘not plain English and easy to get to’ — PythonReactor, Mar 08 '19 at 03:35
If this looks too cumbersome you can always use Reddit's documented API. — Selcuk, Mar 08 '19 at 03:42
I don’t want to use Reddit’s API for various reasons, hence trying to use Scrapy. More so, because I am trying to learn in the event that I come across this same situation somewhere else, and have no access to an API. @ivan_pozdeev I will check that one out, thank you. — PythonReactor, Mar 08 '19 at 03:54

How could I scrape a website with random, non-basic markup code?

0 Answers0