0

I am not entirely new to R but I could use a hand in this one :)

So, I have a dataset on around 14k different stocks over the past 20 years. Now, clearly, some of these stocks were not traded for the entire time period. I am trying to clean my data so that for any stock if in a particular month it was not traded at least one day, the entire month of data for this (only) stock will be deleted. That is, I want to keep the data for every stock only if it was traded every day in a particular month. I hope this is clear!

It will be really great if you could at least guide me to the packages/functions that I can use to do this filtering beautifully.

Thanks!

Example:

mydata <- structure(list(permno = c(10001L, 10001L, 10001L, 10001L, 10001L, 
10001L, 10001L, 10001L, 10001L, 10001L), date = structure(c(7306, 
7307, 7308, 7309, 7312, 7313, 7314, 7315, 7316, 7319), class = "Date"), 
    sic = c(4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920, 
    4920), price = c(3.375, 3.33333333333, NA, 3.35416666667, 
    3.375, 3.375, 3.375, 3.375, 3.375, NA), cap = c(10347750, 
    10220000, 10220000, 10283875, 10347750, 10347750, 10347750, 
    10347750, 10347750, 10092250)), row.names = c(NA, 10L), class = "data.frame")

mydata

permno refers to a particular stock, and for each of them, I only want to keep the data for months in which there is no n/a.

Tamerbei
  • 3
  • 2
  • Please provide an example of your dataset. – www Feb 07 '21 at 01:10
  • @www please see the pic – Tamerbei Feb 07 '21 at 01:18
  • I don't have time to re-create a dataset based on your image. See this to learn how to make a reproducible example (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – www Feb 07 '21 at 01:21
  • Sorry, guys, I hope this is what you were asking for :) – Tamerbei Feb 07 '21 at 01:42
  • So if the stock is not traded on a particular date you have `price` as `NA` ? – Ronak Shah Feb 07 '21 at 01:54
  • @RonakShah yeah, it is either NA or zero. Thank you for your answer! I think I will be able to alter it to work with my dataset. You really helped me understand R better:) – Tamerbei Feb 07 '21 at 02:02

1 Answers1

0

Separate year, month and day from date. For each stock in each month keep only the data if the stock is traded everyday of the month.

library(dplyr)
library(lubridate)

result <- mydata %>%
            mutate(year = year(date), 
                   month = month(date), 
                   day = day(date)) %>%
            group_by(permno, year, month) %>%
            filter(all(!is.na(price)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213