Scraping Reddit with Python

Question

I'm new to web scraping and I have a problem with when I try to scrape the posts from Reddit. It only shows me the top 3 or 4 results. as in this image

The code that I use is:

import time
from selenium import webdriver
from selenium.webdriver.edge.service import Service as EdgeService
from webdriver_manager.microsoft import EdgeChromiumDriverManager
import pandas as pd
import requests
from bs4 import BeautifulSoup


driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get("https://www.reddit.com/r/football/")
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

post_elements = soup.find_all('shreddit-post', class_='block cursor-pointer relative bg-neutral-background focus-within:bg-neutral-background-hover hover:bg-neutral-background-hover xs:rounded-[16px] p-md my-2xs nd:visible')

post_data_list = []

for post_elem in post_elements:
    post_data = {} 
    
    post_data['post_title'] = post_elem['post-title']
    
    post_data['permalink'] = post_elem['permalink']
    
    post_data['author'] = post_elem['author']
    
    post_data['timestamp'] = post_elem.find('time')['datetime']
    
    post_data['score'] = post_elem['score']
    
    post_data['domain'] = post_elem['domain']
    
    post_data_list.append(post_data)

reddit_df = pd.DataFrame(post_data_list)
reddit_df # see the result in picture.

Is there any way to get data from the rest of the posts on reddit? (there are more than 3 posts on the page).

i tried to open it in csv but still only three results.

After using the code from above i was expecting to see a bigger sheet with data from more posts.

If there is a limit for scraping on reddit, is there a way to bypass that limit?

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Aug 12 '23 at 11:06

score 0 · Accepted Answer · answered Aug 12 '23 at 11:30

To get data from reddit I suggest to look at the Json API they provide (add .json at the end of the URL):

import requests
from datetime import datetime


url = "https://old.reddit.com/r/football/.json"  # <-- note the .json at the end

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"
}

data = requests.get(url, headers=headers).json()

for c in data["data"]["children"]:
    t = datetime.utcfromtimestamp(c["data"]["created_utc"]).strftime(
        "%Y-%m-%d %H:%M:%S"
    )

    print(f'{c["data"]["title"][:60]:<60} {c["data"]["ups"]:^5} {t}')

Prints:

r/Football Random Discussion Thread                            5   2023-08-01 21:00:30
Could have been a Pro footballer?                             59   2023-08-12 08:26:48
Letter from Man Utd's Female Fans Against Greenwood           802  2023-08-11 14:30:36
[Official] Harry Kane joins Bayern Munich                     16   2023-08-12 08:16:42
If Neymar goes back to Barca and wins them the UCL, howhch w  21   2023-08-12 04:10:32
Will Kroos be ever discussed as a legendary player?           68   2023-08-11 20:33:52
Best and worst World Cup songs?                                5   2023-08-12 10:07:16
Do you think Liverpool paid to much for moises caicedo.       159  2023-08-11 10:40:34
Lewandowski vs squarez. Who is the better striker of the las  50   2023-08-11 16:17:22
Brazil vs Argentina youth development                          3   2023-08-12 07:25:37
Why are we (Sweden) so good at the ladies' football, but muc  76   2023-08-11 12:29:55
What are some of the most one sided rivalries?                 7   2023-08-11 23:41:44
My Dream Team according to me (comment your suggestions)(12    0   2023-08-12 08:04:02
Exciting News for Soccer Fans in the United States!            0   2023-08-12 11:09:08
Best player to have played in the English Prem?                5   2023-08-11 21:51:15
What's the worst (or least good) club you believe could win    0   2023-08-12 05:11:53
What happened with Julian Nagelssman?                         194  2023-08-10 23:38:47
Real Madrid fans, which player from current/past Barca do yo  35   2023-08-11 09:13:00
The next era of the goat debate                                0   2023-08-12 02:52:22
Lionel Messi at Inter Miami CF (Match 5)                       1   2023-08-12 02:34:24
I accidentally learned how to knuckleball a football as a be   0   2023-08-12 01:39:38
Another act of racism against black Brazilians in South Amer  12   2023-08-11 12:02:54
Why do people think players need to be good at everything to   0   2023-08-12 01:26:07
Haaland tap in merchant my ass                                 2   2023-08-11 19:52:32
Do you guys think Klaksvik has a chance at qualifying for th   0   2023-08-11 22:46:22
Is this pair of football Boots good? I want good shooting an   1   2023-08-11 22:34:14

Can you please explain when do you decide to add a header like `User-Agent` to the request ? — , Aug 12 '23 at 11:34
@rendezvous Because without IT reddit will throttle the traffic after few requests and not send any data. — Andrej Kesely, Aug 12 '23 at 11:35
So it means that I need to know the websites by their names (reddit, ..) to include or not the header ? — , Aug 12 '23 at 11:36
@rendezvous Every website handles the traffic different, so you have to tailor the code to each of them differently. — Andrej Kesely, Aug 12 '23 at 11:37

Scraping Reddit with Python

1 Answers1