0

I have a dataset containing prices of multiple stocks on multiple dates (similar to the below one ) . I am trying to extract all the dates on which individual stocks did not trade (i.e. NA instead of numeric value) so the result would match the format "stock|date". I'm not sure how to approach this since i have a mixutre of headers (date) and the actual records (stock codes) in the first row and the only filtering i've ever done in R was using 'subset' function on data sets with proper column names. My real data set will have approx 2000 columns and 30 rows. Can anyone suggest an efficient way of extracting the NAs?

Data:

date    Stock1  Stock2  Stock3  Stock4  Stock5
01-Jan  10  20  30  40  NA
02-Jan  10  NA  30  40  NA
03-Jan  10  20  30  40  NA
04-Jan  10  20  NA  40  NA
05-Jan  10  20  30  40  50
06-Jan  10  20  30  NA  50
07-Jan  NA  20  30  NA  50

Result:

stock1  07-Jan              
Stock2  02-Jan              
Stock3  04-Jan              
Stock4  06-Jan              
Stock4  07-Jan              
Stock5  01-Jan              
Stock5  02-Jan              
Stock5  03-Jan              
Stock5  04-Jan              
neilfws
  • 32,751
  • 5
  • 50
  • 63
bartek
  • 11
  • 1
  • to remove all NAs. complete.cases(...) – Andrew Bannerman Mar 19 '18 at 22:02
  • 1
    Look for `?gather` – MKR Mar 19 '18 at 22:04
  • 1
    `complete.cases()` won't do what the OP appears to want (based on stated expected result) – De Novo Mar 19 '18 at 22:05
  • [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) – Henrik Mar 19 '18 at 22:08
  • to clarify - i am trying to construct a list of dates for each stock when instead of price it had NA. For example looking at the 1st of Jan all stocks have prices instead of Stock 5 - so the result from 1st row will be Stock5 : 1 Jan. – bartek Mar 19 '18 at 22:13

1 Answers1

1

One solution could be use tidyr::gather and then filter is for value with NA.

  # Sample data 
  df <- read.table(text = "date    Stock1  Stock2  Stock3  Stock4  Stock5
  01-Jan  10  20  30  40  NA
  02-Jan  10  NA  30  40  NA
  03-Jan  10  20  30  40  NA
  04-Jan  10  20  NA  40  NA
  05-Jan  10  20  30  40  50
  06-Jan  10  20  30  NA  50
  07-Jan  NA  20  30  NA  50", header = T)

  library(tidyverse)
  gather(df, key, value, -date) %>% 
    filter(is.na(value)) %>%
    select(key, date)

  # key   date
  # 1 Stock1 07-Jan
  # 2 Stock2 02-Jan
  # 3 Stock3 04-Jan
  # 4 Stock4 06-Jan
  # 5 Stock4 07-Jan
  # 6 Stock5 01-Jan
  # 7 Stock5 02-Jan
  # 8 Stock5 03-Jan
  # 9 Stock5 04-Jan
MKR
  • 19,739
  • 4
  • 23
  • 33