0

I'm using praw the Reddit developer extension to take the title from a certain subreddit. I want to only take the title if the url the post directs to ends in .tv.

How to extract specific URLs that contain the top-level domain .tv and append them to their own list?

    import praw
    reddit = praw.Reddit(client_id='', client_secret='', user_agent='')
    hot_p = reddit.subreddit('music').top('week')

    for post in hot_p:
    # if post.url ends in .tv... 
        raw_titles.append(post.title)
        raw_url.append(post.url)
my name jeff
  • 89
  • 1
  • 9
  • 2
    Use regular expressions or ```urlparse``` library to filter those urls that ends with ```.tv```. – Victor Ruiz Aug 16 '19 at 10:13
  • https://stackoverflow.com/questions/6925825/get-subdomain-from-url-using-python shows how to extract the subdomain from an url – Victor Ruiz Aug 16 '19 at 10:13

1 Answers1

4

I will assume that the URL can be be http://a.b.tv/etc or even http://a.b.tv:80/etc, so:

from urllib.parse import urlparse

for post in hot_p:
    o = urlparse(post.url)
    top_level_domain = o.netloc.split('.')[-1].split(':')[0]
    if top_level_domain == 'tv':
        raw_titles.append(post.title)
        raw_url.append(post.url)
Booboo
  • 38,656
  • 3
  • 37
  • 60