I am new to this (first question). I have a huge news articles dataset (available at Kaggle: https://www.kaggle.com/snapcrack/all-the-news) with 100's or even 1000's of articles for each day. They are not consistently distributed.
I need to take a sample of news articles (lets say 20) for each & every day within the dataset to reduce the size and have consistent number of articles for each day. I then want to use it for further predictive analysis along with another dataset.
So my first question is, how can I sample/subset dataset based on date. I know how to sample dataset in general but not how to do so consistently so that I have articles from each day. I guess it will be a function as dataset has articles over three years, so it will have to be ran over that period.
Secondly, is it possible to show that sample for each day in a single row? so an article per column.
I am currently using Rstudio. Given its my first post, I cannot post the pictures.