0

I am trying to scrape posts from a subreddit (vim)then generate a CSV, however, I want to get the data only for a period of (1/2020 -- 3/2020).

I don't know how to set this time limit in my code. I set the posted limit to 2000 however, it's only for the past 2 days.

In the following I put my code, I really appreciate your kindest support in advance:


    import praw                                                                                  
    import pandas as pd
    #creating an instance of Reddit.
    reddit = praw.Reddit(client_id = 'XXXXX',
                         client_secret ='XXXXXX',
                         user_agent='XXXX')
    #scrape data from the vim subreddit
    posts = reddit.subreddit('vim').hot(limit = 2000)
    #specifying raw & columns 
    c = ['title', 'name', 'url', 'score', 'locked', 'created', 'num of comment', 'upvote ratio']
    df = pd.DataFrame(([post.title, post.name, post.url, post.score, post.locked, post.created, post.num_comments, post.upvote_ratio] for post in posts), columns=c)
    #creating CSV
    df.to_csv('vim_subreddit.csv')

shirin
  • 1
  • 2

1 Answers1

0

Have a look at these docs: https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html#subreddit

From this example, it looks like the limit parameter is used to specify how many records you would like to retrieve, not the time limit you want to search for.

You might want to give this post and the accepted answer a read: PRAW 6: Get all submission of a subreddit

They say that it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.

They also provide some alternatives.

Ben
  • 30
  • 1
  • 6